Multiple attention convolutional-recurrent neural networks for speech emotion recognition

Speech Emotion Recognition is of great significance in the research field of human-computer interaction and affective computing. One of the major challenges for SER now lies in how to explore effective emotional features from lengthy utterances. However, since most of existing deep-learning based SE...

Full description

Saved in:

Bibliographic Details
Published in	2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 8
Main Authors	Zhang, Zhihao, Wang, Kunxia
Format	Conference Proceeding
Language	English
Published	IEEE 18.10.2022
Subjects	Affective computing Convolutional neural networks Emotion recognition Feature extraction Human-computer interaction Multiple attention mechanisms Recurrent neural networks Speech emotion recognition Speech recognition Time series analysis
Online Access	Get full text
DOI	10.1109/ACIIW57231.2022.10086021

Cover

Loading…

Abstract	Speech Emotion Recognition is of great significance in the research field of human-computer interaction and affective computing. One of the major challenges for SER now lies in how to explore effective emotional features from lengthy utterances. However, since most of existing deep-learning based SERs adopt Log-Mel spectrograms as the input model, it is unable to fully convey the emotional information in the speech. Furthermore, limited extraction ability of the model may make it difficult to extract key emotional representations. As a result, in order to address the above issues, we propose a new convolutional recurrent network based on multiple attention, including convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) modules, using extracted Mel-spectrums and Fourier Coefficient features respectively, which helps to complement the emotional information. Further, the multiple attention mechanisms in our model are as follows: Spatial attention and channel attention mechanisms are added to the CNN module to focus on the key emotional area and locate more effective features. Temporal attention gives weights to different time series segment features after BiLSTM extracts sequence information. Experimental results show that the model achieves the WA (weighted accuracy) of 87.9%, 76.5%, and 75.2% respectively while the UA (unweighted accuracy) stands at 87.6%, 73.5%, 70.1 % respectively on EMODB, IEMOCAP, and EESDB speech datasets, which is better than most state-of-the-art methods.
AbstractList	Speech Emotion Recognition is of great significance in the research field of human-computer interaction and affective computing. One of the major challenges for SER now lies in how to explore effective emotional features from lengthy utterances. However, since most of existing deep-learning based SERs adopt Log-Mel spectrograms as the input model, it is unable to fully convey the emotional information in the speech. Furthermore, limited extraction ability of the model may make it difficult to extract key emotional representations. As a result, in order to address the above issues, we propose a new convolutional recurrent network based on multiple attention, including convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) modules, using extracted Mel-spectrums and Fourier Coefficient features respectively, which helps to complement the emotional information. Further, the multiple attention mechanisms in our model are as follows: Spatial attention and channel attention mechanisms are added to the CNN module to focus on the key emotional area and locate more effective features. Temporal attention gives weights to different time series segment features after BiLSTM extracts sequence information. Experimental results show that the model achieves the WA (weighted accuracy) of 87.9%, 76.5%, and 75.2% respectively while the UA (unweighted accuracy) stands at 87.6%, 73.5%, 70.1 % respectively on EMODB, IEMOCAP, and EESDB speech datasets, which is better than most state-of-the-art methods.
Author	Wang, Kunxia Zhang, Zhihao
Author_xml	– sequence: 1 givenname: Zhihao surname: Zhang fullname: Zhang, Zhihao email: 1511827481@qq.com organization: School of electronic and information engineering, Anhui Jianzhu University,Anhui International Joint Research Center for Ancient Architecture Intellisencing and Multi-Dimensional Modeling,HeFei,China – sequence: 2 givenname: Kunxia surname: Wang fullname: Wang, Kunxia email: kxwang@ahjzu.edu.cn organization: Higher Education Institutes, School of electronic and information engineering, Anhui Jianzhu University,Key Laboratory of Architectural Acoustic Environment of Anhui,HeFei,China
BookMark	eNo1j8tKxDAYhSPowhl9Axd5gdY_16bLoXgZGHGjiKsh6fzRYicpaar49nZGXX0HzgXOgpyGGJAQyqBkDOrrVbNev6iKC1Zy4LxkAEYDZydkwbRWUskaxDl5fZj63A09UpszhtzFQNsYPmM_HbTti4TtlNJs0YBTsv2M_BXTx0h9THQcENt3ivt4rM7h-Ba6g74gZ972I17-cUmeb2-emvti83i3blabouMgc2F1xU3lvWTeOeNapiwDKbTz0hlVG9gJh1w5ZAg1h1poacEw5GaH8xEUS3L1u9sh4nZI3d6m7-3_XfEDGh5Sbw
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ACIIW57231.2022.10086021
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665454903 9781665454902
EndPage	8
ExternalDocumentID	10086021
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i204t-a67287ff41fbb8bc15a10436bf4b85980d3be25be1e09209364a081e28de166e3
IEDL.DBID	RIE
IngestDate	Thu Jan 18 11:14:29 EST 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-a67287ff41fbb8bc15a10436bf4b85980d3be25be1e09209364a081e28de166e3
PageCount	8
ParticipantIDs	ieee_primary_10086021
PublicationCentury	2000
PublicationDate	2022-Oct.-18
PublicationDateYYYYMMDD	2022-10-18
PublicationDate_xml	– month: 10 year: 2022 text: 2022-Oct.-18 day: 18
PublicationDecade	2020
PublicationTitle	2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
PublicationTitleAbbrev	ACIIW
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8201442
Snippet	Speech Emotion Recognition is of great significance in the research field of human-computer interaction and affective computing. One of the major challenges...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Affective computing Convolutional neural networks Emotion recognition Feature extraction Human-computer interaction Multiple attention mechanisms Recurrent neural networks Speech emotion recognition Speech recognition Time series analysis
Title	Multiple attention convolutional-recurrent neural networks for speech emotion recognition
URI	https://ieeexplore.ieee.org/document/10086021
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3PS8MwFMeD7uRJxYm_ycFruiRNsvQow7EJGx4cztNo0lcURzdcd_GvN68_JgqCp5ZSaMkrvPfS7_fzCLnNQpGeqdwwzzPPlPacWeNyZmXuE2u1TR36nSdTM5qph7meN2b1ygsDAJX4DCI8rf7lZyu_xa2yHoJoDEfb-H7o3GqzVqvO4UnvbjAeP-t-qFhC3ydl1N7-Y3BKlTeGh2TaPrGWi7xH29JF_vMXjPHfr3REut8WPfq4Sz7HZA-KE_IyaeSBFKmZlY6Roqy8-bzSJfvA7XUEMlEEWabLcKhk4Bsaile6WQP4Vwr1aB-6Exetii6ZDe-fBiPWzE5gb5KrkqWmH3qhPFcid846L3QqkDbvcuWsTizPYgdSOxDAE8mT2Kg0VAcgbQbCGIhPSadYFXBGKPI7ZeiaFNK4hMlSHvuYA8i-DqGMxTnp4ros1jUeY9EuycUf1y_JAYYHE4CwV6RTfmzhOmT20t1UEf0CzIOluA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA2iBz2pWPHbHLzuNskmafYoxdJqWzy0WE9lk51FsbSl3V789Wb2o6IgeMoSCLtkAm8m-94bQu5Sn6SnMtOBY6kLpHIsMNpmgRGZi41RJrGodx4MdXcsHydqUonVCy0MABTkMwjxsfiXny7cBq_KmmhEoxnKxvc88CteyrVqfg6Lm_ftXu9FtXzO4is_IcJ6wY_WKQVydA7JsH5nSRj5CDe5Dd3nLzvGf3_UEWl8i_To8xZ-jskOzE_I66AiCFL0zSyYjBSJ5dUBS2bBCi_Y0ZKJopVlMvNDQQRfU5--0vUSwL1RKJv70C29aDFvkHHnYdTuBlX3hOBdMJkHiW75aijLJM-sNdZxlXD0m7eZtEbFhqWRBaEscGCxYHGkZeLzAxAmBa41RKdkd76Ywxmh6OApfN0k0Y-L6zRhkYsYgGgpH8yIn5MG7st0WRpkTOstufhj_pbsd0eD_rTfGz5dkgMMFcIBN1dkN19t4NrjfG5viuh-AaD4qQE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+10th+International+Conference+on+Affective+Computing+and+Intelligent+Interaction+Workshops+and+Demos+%28ACIIW%29&rft.atitle=Multiple+attention+convolutional-recurrent+neural+networks+for+speech+emotion+recognition&rft.au=Zhang%2C+Zhihao&rft.au=Wang%2C+Kunxia&rft.date=2022-10-18&rft.pub=IEEE&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FACIIW57231.2022.10086021&rft.externalDocID=10086021