Emotion recognition from speech with StarGAN and Dense‐DCNN
Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required fo...
Saved in:
Published in | IET signal processing Vol. 16; no. 1; pp. 62 - 79 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
John Wiley & Sons, Inc
01.02.2022
Wiley |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Both traditional and the latest speech emotion recognition methods face the same problem, that is, the lack of standard emotion speech data sets. This leads to the network being unable to learn emotion features comprehensively because of limited data. Moreover, in these methods, the time required for training is extremely long, which makes it difficult to ensure efficient classification. The proposed network Dense‐DCNN, combined with StarGAN, can address this issue. StarGAN is used to generate numerous Log‐Mel spectra with related emotions and extract high‐dimensional features through the Dense‐DCNN to achieve a high‐precision classification. The classification accuracy for all the data sets was more than 90%. Simultaneously, DenseNet's layer jump connection can speed up the classification process, thereby improving efficiency. The experimental verification shows that our model not only has good generalisation ability but also exhibits good robustness in multiscene and multinoise environments, thereby showing potential for application in medical and social education industries. |
---|---|
Bibliography: | Lu‐Qiao Li, Kai Xie, and Xiao‐Long Guo contributed equally to this work. |
ISSN: | 1751-9675 1751-9683 |
DOI: | 10.1049/sil2.12078 |