End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild

As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, name...

Full description

Saved in:
Bibliographic Details
Published inMultimodal technologies and interaction Vol. 6; no. 2; p. 11
Main Authors Dresvyanskiy, Denis, Ryumina, Elena, Kaya, Heysem, Markitantov, Maxim, Karpov, Alexey, Minker, Wolfgang
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.02.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely “in-the-wild” data. This work investigates audiovisual deep learning approaches to emotion recognition in in-the-wild problem. Inspired by the outstanding performance of end-to-end and transfer learning techniques, we explored the effectiveness of architectures in which a modality-specific Convolutional Neural Network (CNN) is followed by a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) using the AffWild2 dataset under the Affective Behavior Analysis in-the-Wild (ABAW) challenge protocol. We deployed unimodal end-to-end and transfer learning approaches within a multimodal fusion system, which generated final predictions using a weighted score fusion scheme. Exploiting the proposed deep-learning-based multimodal system, we reached a test set challenge performance measure of 48.1% on the ABAW 2020 Facial Expressions challenge, which advances the first-runner-up performance.
ISSN:2414-4088
2414-4088
DOI:10.3390/mti6020011