End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings
Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the...
Saved in:
Published in | International Conference on Affective Computing and Intelligent Interaction and workshops pp. 1 - 8 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
28.09.2021
|
Subjects | |
Online Access | Get full text |
ISSN | 2156-8111 |
DOI | 10.1109/ACII52823.2021.9597419 |
Cover
Loading…
Abstract | Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture with the real life corpus, CEMO, comprised of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real-life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers. |
---|---|
AbstractList | Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture with the real life corpus, CEMO, comprised of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real-life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers. |
Author | Deschamps-Berger, Theo Lamel, Lori Devillers, Laurence |
Author_xml | – sequence: 1 givenname: Theo surname: Deschamps-Berger fullname: Deschamps-Berger, Theo email: theo.deschamps-berger@u-psud.fr organization: LISN Paris-Saclay University, CNRS,Orsay,France – sequence: 2 givenname: Lori surname: Lamel fullname: Lamel, Lori email: lori.lamel@limsi.fr organization: LISN CNRS,Orsay,France – sequence: 3 givenname: Laurence surname: Devillers fullname: Devillers, Laurence email: devil@limsi.fr organization: LISN CNRS,Orsay,France |
BookMark | eNotkN1Kw0AQhVdRsK19AkH2Bbbu7F-y3pVYNVAQ_Lkum-kkjaSbkuSmb29qe3WGcz4OzJmym9hGYuwR5AJA-qdlludWpUovlFSw8NYnBvwVm4Jz1kgJ3l2ziQLrRAoAd2ze97_y5FuZpnbCaBW3YmjFKPzrQIQ7vtq3Q91G_knYVrE-3c8824WmoVhRz9tyjEIj1nVJI0xdRRGPPBsBnlEcqOv5SxjCf0G3rWPV37PbMjQ9zS86Yz-vq-_sXaw_3vJsuRY7ldpBBCMVFrpM0GKBYEv020KjhmC0S7wrC5W6YL1RqK1OENCMP44EFkYab_WMPZx7ayLaHLp6H7rj5rKK_gOq9lhp |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ACII52823.2021.9597419 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 1665400196 9781665400190 |
EISSN | 2156-8111 |
EndPage | 8 |
ExternalDocumentID | 9597419 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
ID | FETCH-LOGICAL-h285t-a402cb3f7c5cbc15fc9db3c31a436796fb286a5942c3537c1c4400b3ccb404953 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:27:03 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-h285t-a402cb3f7c5cbc15fc9db3c31a436796fb286a5942c3537c1c4400b3ccb404953 |
OpenAccessLink | https://hal.science/hal-03405970/document |
PageCount | 8 |
ParticipantIDs | ieee_primary_9597419 |
PublicationCentury | 2000 |
PublicationDate | 2021-Sept.-28 |
PublicationDateYYYYMMDD | 2021-09-28 |
PublicationDate_xml | – month: 09 year: 2021 text: 2021-Sept.-28 day: 28 |
PublicationDecade | 2020 |
PublicationTitle | International Conference on Affective Computing and Intelligent Interaction and workshops |
PublicationTitleAbbrev | ACII |
PublicationYear | 2021 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001950885 |
Score | 1.9627255 |
Snippet | Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Affective computing call center complex emotions Computer architecture Deep learning Diversity reception emotion detection Emotion recognition end-to-end deep learning architecture real-life database Speech recognition |
Title | End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings |
URI | https://ieeexplore.ieee.org/document/9597419 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV27TsMwFLVKJ6YCLeItD4w4bfzIgw2FVC0ChIBK3Sr7xlYRKKkgXfh67DyoQAxMiRJHiWw5vuf6nHsQOqcKIApDIBmAIVxyTZRWkohASeVz8DNdEWTvg8mM38zFvIMuvrUwWuuKfKY9d1rt5WcFrF2qbBi76NfV-NyywK3Wam3yKZWdqWhEwP4oHl4l06mwiIJZFEh9r3n4h4tKtYiMe-iufX3NHXn11qXy4PNXZcb_ft8OGmzkevjheyHaRR2d76Fe69eAm-nbRzrNM1IWxB7w00prWOK0dvHBjy2PqMgvcdIarHzgwthb8o3cvhiN01aqiRPbALvEsA0e8bUsJa5hrEu7D9BsnD4nE9LYLJAljURJpIWQoJgJQYACXxiIM8WA-ZIzl2UyikaBFDGnwAQLwQduJ75tAYqPHD11H3XzItcHCCvJqOL2N2CDQB4YauG1HkkVjAynJhbZIeq7Xlus6koai6bDjv6-fIy23cg5dgaNTlC3fF_rUxsClOqsGvsvVFSxzw |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0IHvSECsbf9uDRDta1G_NmcAQUiFFIuJH2WxuMZiM6Lv71tvsB0XjwtGXrkqZb1-97fe97CF1TCdANAiAxgCZMMEWkkoJwXwrpMnBjlRNkJ_5gxh7mfF5DNxstjFIqJ58px57me_lxCmsLlbVDG_3aGp873IpxC7XWFlHJDU15KQN2O2H7rjcccpNTeCYPpK5TPv7DRyVfRvoNNK46ULBH3px1Jh34-lWb8b893EetrWAPP22WogNUU8khalSODbicwE2koiQmWUrMAb-slIIljgofH_xcMYnS5Bb3KouVT5xqc0u8k9GrVjiqxJq4ZxpgCw2b8BHfi0zgIpG1wHsLzfrRtDcgpdECWdIuz4gwSSRITwfAQYLLNYSx9MBzBfMszqQl7fqCh4yCx70AXGBm6psWIFnHElSPUD1JE3WMsBQelcz8CEwYyHxNTYKtOkL6Hc2oDnl8gpp21BaropbGohyw078vX6HdwXQ8WoyGk8cztGffouVq0O45qmcfa3VhAoJMXubfwTfI6LUX |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Affective+Computing+and+Intelligent+Interaction+and+workshops&rft.atitle=End-to-End+Speech+Emotion+Recognition%3A+Challenges+of+Real-Life+Emergency+Call+Centers+Data+Recordings&rft.au=Deschamps-Berger%2C+Theo&rft.au=Lamel%2C+Lori&rft.au=Devillers%2C+Laurence&rft.date=2021-09-28&rft.pub=IEEE&rft.eissn=2156-8111&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FACII52823.2021.9597419&rft.externalDocID=9597419 |