Attention-Based Convolution Skip Bidirectional Long Short-Term Memory Network for Speech Emotion Recognition

Speech emotion recognition is a challenging task in natural language processing. It relies heavily on the effectiveness of speech features and acoustic models. However, existing acoustic models may not handle speech emotion recognition efficiently for their built-in limitations. In this work, a nove...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 9; pp. 5332 - 5342
Main Authors	Zhang, Huiyun, Huang, Heming, Han, Henry
Format	Journal Article
Language	English
Published	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics attention mechanism Convolution Deep learning Emotion recognition Emotions Hidden Markov models Logic gates Machine learning Model accuracy Natural language processing Short term skip connection Speech recognition weighted pooling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Speech emotion recognition is a challenging task in natural language processing. It relies heavily on the effectiveness of speech features and acoustic models. However, existing acoustic models may not handle speech emotion recognition efficiently for their built-in limitations. In this work, a novel deep-learning acoustic model called attention-based skip convolution bi-directional long short-term memory, abbreviated as SCBAMM, is proposed to recognize speech emotion. It has eight hidden layers, namely, two dense layers, convolutional layer, skip layer, mask layer, Bi-LSTM layer, attention layer, and pooling layer. SCBAMM makes better use of spatiotemporal information and captures emotion-related features more effectively. In addition, it solves the problems of gradient exploding and gradient vanishing in deep learning to some extent. On the databases EMO-DB and CASIA, the proposed model SCBAMM achieves an accuracy rate of 94.58% and 72.50%, respectively. As far as we know, compared with peer models, this is the best accuracy rate.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.3047395