MMA-MRNNet: Harnessing Multiple Models of Affect and Dynamic Masked RNN for Precise Facial Expression Intensity Estimation
This paper presents MMA-MRNNet, a novel deep learning architecture for dynamic multi-output Facial Expression Intensity Estimation (FEIE) from video data. Traditional approaches to this task often rely on complex 3-D CNNs, which require extensive pre-training and assume that facial expressions are u...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents MMA-MRNNet, a novel deep learning architecture for
dynamic multi-output Facial Expression Intensity Estimation (FEIE) from video
data. Traditional approaches to this task often rely on complex 3-D CNNs, which
require extensive pre-training and assume that facial expressions are uniformly
distributed across all frames of a video. These methods struggle to handle
videos of varying lengths, often resorting to ad-hoc strategies that either
discard valuable information or introduce bias. MMA-MRNNet addresses these
challenges through a two-stage process. First, the Multiple Models of Affect
(MMA) extractor component is a Multi-Task Learning CNN that concurrently
estimates valence-arousal, recognizes basic facial expressions, and detects
action units in each frame. These representations are then processed by a
Masked RNN component, which captures temporal dependencies and dynamically
updates weights according to the true length of the input video, ensuring that
only the most relevant features are used for the final prediction. The proposed
unimodal non-ensemble learning MMA-MRNNet was evaluated on the Hume-Reaction
dataset and demonstrated significantly superior performance, surpassing
state-of-the-art methods by a wide margin, regardless of whether they were
unimodal, multimodal, or ensemble approaches. Finally, we demonstrated the
effectiveness of the MMA component of our proposed method across multiple
in-the-wild datasets, where it consistently outperformed all state-of-the-art
methods across various metrics. |
---|---|
DOI: | 10.48550/arxiv.2303.00180 |