Emotion recognition from speech with StarGAN and Dense‐DCNN

Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required fo...

Full description

Saved in:
Bibliographic Details
Published inIET signal processing Vol. 16; no. 1; pp. 62 - 79
Main Authors Li, Lu‐Qiao, Xie, Kai, Guo, Xiao‐Long, Wen, Chang, He, Jian‐Biao
Format Journal Article
LanguageEnglish
Published John Wiley & Sons, Inc 01.02.2022
Wiley
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required for training is extremely long, which makes it difficult to ensure efficient classification. The proposed network Dense‐DCNN, combined with StarGAN, can address this issue. StarGAN is used to generate numerous Log‐Mel spectra with related emotions and extract high‐dimensional features through the Dense‐DCNN to achieve a high‐precision classification. The classification accuracy for all the data sets was more than 90%. Simultaneously, DenseNet's layer jump connection can speed up the classification process, thereby improving efficiency. The experimental verification shows that our model not only has good generalisation ability but also exhibits good robustness in multiscene and multinoise environments, thereby showing potential for application in medical and social education industries.
Bibliography:Lu‐Qiao Li, Kai Xie, and Xiao‐Long Guo contributed equally to this work.
ISSN:1751-9675
1751-9683
DOI:10.1049/sil2.12078