An optimal model with a lower bound of recall for imbalanced speech emotion recognition

In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model w...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 79; no. 33-34; pp. 24281 - 24301
Main Authors	Ai, Xusheng, Sheng, Victor S., Fang, Wei, Ling, Charles X.
Format	Journal Article
Language	English
Published	New York Springer US 01.09.2020 Springer Nature B.V
Subjects	Accuracy Algorithms Anger Artificial neural networks Call centers Classification Computer Communication Networks Computer Science Customers Data Structures and Information Theory Deep learning Emergency communications systems Emotion recognition Emotions Lower bounds Multimedia Multimedia Information Systems Neural networks Performance evaluation Recall Software Special Purpose and Application-Based Systems Speech Speech recognition Training Warning systems Deep neural network Imbalance Convolutional neural network Speech emotion recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F 1 score. It is divided into three aspects: 1) A variant of F 1 score ( T F 1 score) takes recall above a lower bound and F 1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of T F 1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F 1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-020-09155-3