An optimal model with a lower bound of recall for imbalanced speech emotion recognition

In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model w...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 79; no. 33-34; pp. 24281 - 24301
Main Authors Ai, Xusheng, Sheng, Victor S., Fang, Wei, Ling, Charles X.
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2020
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F 1 score. It is divided into three aspects: 1) A variant of F 1 score ( T F 1 score) takes recall above a lower bound and F 1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of T F 1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F 1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-09155-3