Multi-Corpus Affect Recognition with Emotion Embeddings and Self-Supervised Representations of Speech

Speech emotion recognition systems use data-driven machine learning techniques that rely on annotated corpora. To achieve a usable performance in real-life, we need to exploit multiple different datasets since each one can shed the light on some specific expression of affect. However, different corp...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Affective Computing and Intelligent Interaction and workshops pp. 1 - 8
Main Authors	Alisamir, Sina, Ringeval, Fabien, Portet, Francois
Format	Conference Proceeding
Language	English
Published	IEEE 18.10.2022
Subjects	affective computing Annotations emotion embedding Emotion recognition multi-task learning Neural networks self-supervised representation Speech recognition Target recognition Training Working environment noise
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Speech emotion recognition systems use data-driven machine learning techniques that rely on annotated corpora. To achieve a usable performance in real-life, we need to exploit multiple different datasets since each one can shed the light on some specific expression of affect. However, different corpora use subjectively defined annotation schemes, which poses a challenge to train a model that can sense similar emotions across different corpora. Here, we propose a method that can relate similar emotions across corpora without being explicitly trained for it. Our method relies on self-supervised representations, which can provide us with highly contextualised speech representations, and multi-task learning paradigms. This allows to train on different corpora without changing their labelling schemes. The results show that by fine-tuning self-supervised representations on each corpus separately, we can significantly improve the state of the art within-corpus performance. We further demonstrate that by using multiple corpora during the training of the same model, we can improve the cross-corpus performance, and show that our emotion embeddings can effectively recognise the same emotions across different corpora.
ISSN:	2156-8111
DOI:	10.1109/ACII55700.2022.9953840