Multi-modal Affect Analysis using standardized data within subjects in the Wild

Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal...

Full description

Saved in:

Bibliographic Details
Main Authors	Youoku, Sachihiro, Yamamoto, Takahisa, Saito, Junya, Uchida, Akiyoshi, Mi, Xiaoyu, Shi, Ziqiang, Liu, Liu, Liu, Zhongling, Nakayama, Osafumi, Murase, Kentaro
Format	Journal Article
Language	English
Published	07.07.2021
Subjects	Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest. When annotating facial expressions from a video, we thought that it would be judged not only from the features common to all people, but also from the relative changes in the time series of individuals. Therefore, after learning the common features for each frame, we constructed a facial expression estimation model and valence-arousal model using time-series data after combining the common features and the standardized features for each video. Furthermore, the above features were learned using multi-modal data such as image features, AU, Head pose, and Gaze. In the validation set, our model achieved a facial expression score of 0.546. These verification results reveal that our proposed framework can improve estimation accuracy and robustness effectively.
DOI:	10.48550/arxiv.2107.03009