Integration of Annotator-wise Estimations for Emotion Recognition by Using Group Softmax

In emotion recognition, a major modeling difficulty arises from the different perceptions of emotion from annotator to annotator. Generally, it is common to use a one-hot (dominant) emotion label, which is obtained using majority voting by annotator-wise (minor) emotion labels. Previous studies show...

Full description

Saved in:

Bibliographic Details
Published in	2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) pp. 694 - 699
Main Author	Tachioka, Yuuki
Format	Conference Proceeding
Language	English
Published	APSIPA 14.12.2021
Subjects	Computational modeling Degradation Emotion recognition Estimation Information processing Multitasking Speech recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In emotion recognition, a major modeling difficulty arises from the different perceptions of emotion from annotator to annotator. Generally, it is common to use a one-hot (dominant) emotion label, which is obtained using majority voting by annotator-wise (minor) emotion labels. Previous studies show that the introduction of soft-target labels, which consider the frequency of annotator-wise labels, improves emotion recognition performance. However, these studies did not use minor emotion labels directly. Another study used multi-task learning to handle dominant and minor emotions independently, but this independent modeling is inappropriate because the two are closely related. We propose a sequential model composed of multiple annotator-wise classifiers and their majority voting to estimate dominant emotion. When using multiple classifiers, classifier imbalance, where the difficulty of classification is different from classifier to classifier, causes performance degradation. To address this classifier imbalance problem, we assign a group softmax to multiple annotator-wise classifiers. Experiments show that majority voting by estimated annotator-wise emotions improves the estimation performance for dominant emotions when compared with conventional methods that estimate dominant emotion directly. In addition, the proposed method is effective not only for speech emotion recognition but also for speech and text emotion recognition.
ISSN:	2640-0103