End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings

Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Affective Computing and Intelligent Interaction and workshops pp. 1 - 8
Main Authors	Deschamps-Berger, Theo, Lamel, Lori, Devillers, Laurence
Format	Conference Proceeding
Language	English
Published	IEEE 28.09.2021
Subjects	Affective computing call center complex emotions Computer architecture Deep learning Diversity reception emotion detection Emotion recognition end-to-end deep learning architecture real-life database Speech recognition
Online Access	Get full text
ISSN	2156-8111
DOI	10.1109/ACII52823.2021.9597419

Cover

Loading…

Abstract	Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture with the real life corpus, CEMO, comprised of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real-life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers.
AbstractList	Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture with the real life corpus, CEMO, comprised of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real-life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers.
Author	Deschamps-Berger, Theo Lamel, Lori Devillers, Laurence
Author_xml	– sequence: 1 givenname: Theo surname: Deschamps-Berger fullname: Deschamps-Berger, Theo email: theo.deschamps-berger@u-psud.fr organization: LISN Paris-Saclay University, CNRS,Orsay,France – sequence: 2 givenname: Lori surname: Lamel fullname: Lamel, Lori email: lori.lamel@limsi.fr organization: LISN CNRS,Orsay,France – sequence: 3 givenname: Laurence surname: Devillers fullname: Devillers, Laurence email: devil@limsi.fr organization: LISN CNRS,Orsay,France
BookMark	eNotkN1Kw0AQhVdRsK19AkH2Bbbu7F-y3pVYNVAQ_Lkum-kkjaSbkuSmb29qe3WGcz4OzJmym9hGYuwR5AJA-qdlludWpUovlFSw8NYnBvwVm4Jz1kgJ3l2ziQLrRAoAd2ze97_y5FuZpnbCaBW3YmjFKPzrQIQ7vtq3Q91G_knYVrE-3c8824WmoVhRz9tyjEIj1nVJI0xdRRGPPBsBnlEcqOv5SxjCf0G3rWPV37PbMjQ9zS86Yz-vq-_sXaw_3vJsuRY7ldpBBCMVFrpM0GKBYEv020KjhmC0S7wrC5W6YL1RqK1OENCMP44EFkYab_WMPZx7ayLaHLp6H7rj5rKK_gOq9lhp
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ACII52823.2021.9597419
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1665400196 9781665400190
EISSN	2156-8111
EndPage	8
ExternalDocumentID	9597419
Genre	orig-research
GroupedDBID	6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL
ID	FETCH-LOGICAL-h285t-a402cb3f7c5cbc15fc9db3c31a436796fb286a5942c3537c1c4400b3ccb404953
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:27:03 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-h285t-a402cb3f7c5cbc15fc9db3c31a436796fb286a5942c3537c1c4400b3ccb404953
OpenAccessLink	https://hal.science/hal-03405970/document
PageCount	8
ParticipantIDs	ieee_primary_9597419
PublicationCentury	2000
PublicationDate	2021-Sept.-28
PublicationDateYYYYMMDD	2021-09-28
PublicationDate_xml	– month: 09 year: 2021 text: 2021-Sept.-28 day: 28
PublicationDecade	2020
PublicationTitle	International Conference on Affective Computing and Intelligent Interaction and workshops
PublicationTitleAbbrev	ACII
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0001950885
Score	1.9627255
Snippet	Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Affective computing call center complex emotions Computer architecture Deep learning Diversity reception emotion detection Emotion recognition end-to-end deep learning architecture real-life database Speech recognition
Title	End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings
URI	https://ieeexplore.ieee.org/document/9597419
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV27TsMwFLVKJ6YCLeItD4w4bfzIgw2FVC0ChIBK3Sr7xlYRKKkgXfh67DyoQAxMiRJHiWw5vuf6nHsQOqcKIApDIBmAIVxyTZRWkohASeVz8DNdEWTvg8mM38zFvIMuvrUwWuuKfKY9d1rt5WcFrF2qbBi76NfV-NyywK3Wam3yKZWdqWhEwP4oHl4l06mwiIJZFEh9r3n4h4tKtYiMe-iufX3NHXn11qXy4PNXZcb_ft8OGmzkevjheyHaRR2d76Fe69eAm-nbRzrNM1IWxB7w00prWOK0dvHBjy2PqMgvcdIarHzgwthb8o3cvhiN01aqiRPbALvEsA0e8bUsJa5hrEu7D9BsnD4nE9LYLJAljURJpIWQoJgJQYACXxiIM8WA-ZIzl2UyikaBFDGnwAQLwQduJ75tAYqPHD11H3XzItcHCCvJqOL2N2CDQB4YauG1HkkVjAynJhbZIeq7Xlus6koai6bDjv6-fIy23cg5dgaNTlC3fF_rUxsClOqsGvsvVFSxzw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0IHvSECsbf9uDRDta1G_NmcAQUiFFIuJH2WxuMZiM6Lv71tvsB0XjwtGXrkqZb1-97fe97CF1TCdANAiAxgCZMMEWkkoJwXwrpMnBjlRNkJ_5gxh7mfF5DNxstjFIqJ58px57me_lxCmsLlbVDG_3aGp873IpxC7XWFlHJDU15KQN2O2H7rjcccpNTeCYPpK5TPv7DRyVfRvoNNK46ULBH3px1Jh34-lWb8b893EetrWAPP22WogNUU8khalSODbicwE2koiQmWUrMAb-slIIljgofH_xcMYnS5Bb3KouVT5xqc0u8k9GrVjiqxJq4ZxpgCw2b8BHfi0zgIpG1wHsLzfrRtDcgpdECWdIuz4gwSSRITwfAQYLLNYSx9MBzBfMszqQl7fqCh4yCx70AXGBm6psWIFnHElSPUD1JE3WMsBQelcz8CEwYyHxNTYKtOkL6Hc2oDnl8gpp21BaropbGohyw078vX6HdwXQ8WoyGk8cztGffouVq0O45qmcfa3VhAoJMXubfwTfI6LUX
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Affective+Computing+and+Intelligent+Interaction+and+workshops&rft.atitle=End-to-End+Speech+Emotion+Recognition%3A+Challenges+of+Real-Life+Emergency+Call+Centers+Data+Recordings&rft.au=Deschamps-Berger%2C+Theo&rft.au=Lamel%2C+Lori&rft.au=Devillers%2C+Laurence&rft.date=2021-09-28&rft.pub=IEEE&rft.eissn=2156-8111&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FACII52823.2021.9597419&rft.externalDocID=9597419