End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild

As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, name...

Full description

Saved in:

Bibliographic Details
Published in	Multimodal technologies and interaction Vol. 6; no. 2; p. 11
Main Authors	Dresvyanskiy, Denis, Ryumina, Elena, Kaya, Heysem, Markitantov, Maxim, Karpov, Alexey, Minker, Wolfgang
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.02.2022
Subjects	affective computing Artificial neural networks Datasets Deep learning deep learning architectures Emotion recognition Emotions face processing Human communication Machine learning multimodal fusion multimodal representations Neural networks Physiology Recurrent neural networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely “in-the-wild” data. This work investigates audiovisual deep learning approaches to emotion recognition in in-the-wild problem. Inspired by the outstanding performance of end-to-end and transfer learning techniques, we explored the effectiveness of architectures in which a modality-specific Convolutional Neural Network (CNN) is followed by a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) using the AffWild2 dataset under the Affective Behavior Analysis in-the-Wild (ABAW) challenge protocol. We deployed unimodal end-to-end and transfer learning approaches within a multimodal fusion system, which generated final predictions using a weighted score fusion scheme. Exploiting the proposed deep-learning-based multimodal system, we reached a test set challenge performance measure of 48.1% on the ABAW 2020 Facial Expressions challenge, which advances the first-runner-up performance.
ISSN:	2414-4088 2414-4088
DOI:	10.3390/mti6020011