End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings

Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Affective Computing and Intelligent Interaction and workshops pp. 1 - 8
Main Authors Deschamps-Berger, Theo, Lamel, Lori, Devillers, Laurence
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.09.2021
Subjects
Online AccessGet full text
ISSN2156-8111
DOI10.1109/ACII52823.2021.9597419

Cover

Loading…
Abstract Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture with the real life corpus, CEMO, comprised of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real-life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers.
AbstractList Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture with the real life corpus, CEMO, comprised of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real-life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers.
Author Deschamps-Berger, Theo
Lamel, Lori
Devillers, Laurence
Author_xml – sequence: 1
  givenname: Theo
  surname: Deschamps-Berger
  fullname: Deschamps-Berger, Theo
  email: theo.deschamps-berger@u-psud.fr
  organization: LISN Paris-Saclay University, CNRS,Orsay,France
– sequence: 2
  givenname: Lori
  surname: Lamel
  fullname: Lamel, Lori
  email: lori.lamel@limsi.fr
  organization: LISN CNRS,Orsay,France
– sequence: 3
  givenname: Laurence
  surname: Devillers
  fullname: Devillers, Laurence
  email: devil@limsi.fr
  organization: LISN CNRS,Orsay,France
BookMark eNotkN1Kw0AQhVdRsK19AkH2Bbbu7F-y3pVYNVAQ_Lkum-kkjaSbkuSmb29qe3WGcz4OzJmym9hGYuwR5AJA-qdlludWpUovlFSw8NYnBvwVm4Jz1kgJ3l2ziQLrRAoAd2ze97_y5FuZpnbCaBW3YmjFKPzrQIQ7vtq3Q91G_knYVrE-3c8824WmoVhRz9tyjEIj1nVJI0xdRRGPPBsBnlEcqOv5SxjCf0G3rWPV37PbMjQ9zS86Yz-vq-_sXaw_3vJsuRY7ldpBBCMVFrpM0GKBYEv020KjhmC0S7wrC5W6YL1RqK1OENCMP44EFkYab_WMPZx7ayLaHLp6H7rj5rKK_gOq9lhp
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ACII52823.2021.9597419
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1665400196
9781665400190
EISSN 2156-8111
EndPage 8
ExternalDocumentID 9597419
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-h285t-a402cb3f7c5cbc15fc9db3c31a436796fb286a5942c3537c1c4400b3ccb404953
IEDL.DBID RIE
IngestDate Wed Aug 27 02:27:03 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-h285t-a402cb3f7c5cbc15fc9db3c31a436796fb286a5942c3537c1c4400b3ccb404953
OpenAccessLink https://hal.science/hal-03405970/document
PageCount 8
ParticipantIDs ieee_primary_9597419
PublicationCentury 2000
PublicationDate 2021-Sept.-28
PublicationDateYYYYMMDD 2021-09-28
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-Sept.-28
  day: 28
PublicationDecade 2020
PublicationTitle International Conference on Affective Computing and Intelligent Interaction and workshops
PublicationTitleAbbrev ACII
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001950885
Score 1.9627255
Snippet Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Affective computing
call center
complex emotions
Computer architecture
Deep learning
Diversity reception
emotion detection
Emotion recognition
end-to-end deep learning architecture
real-life database
Speech recognition
Title End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings
URI https://ieeexplore.ieee.org/document/9597419
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV27TsMwFLVKJ6YCLeItD4w4bfzIgw2FVC0ChIBK3Sr7xlYRKKkgXfh67DyoQAxMiRJHiWw5vuf6nHsQOqcKIApDIBmAIVxyTZRWkohASeVz8DNdEWTvg8mM38zFvIMuvrUwWuuKfKY9d1rt5WcFrF2qbBi76NfV-NyywK3Wam3yKZWdqWhEwP4oHl4l06mwiIJZFEh9r3n4h4tKtYiMe-iufX3NHXn11qXy4PNXZcb_ft8OGmzkevjheyHaRR2d76Fe69eAm-nbRzrNM1IWxB7w00prWOK0dvHBjy2PqMgvcdIarHzgwthb8o3cvhiN01aqiRPbALvEsA0e8bUsJa5hrEu7D9BsnD4nE9LYLJAljURJpIWQoJgJQYACXxiIM8WA-ZIzl2UyikaBFDGnwAQLwQduJ75tAYqPHD11H3XzItcHCCvJqOL2N2CDQB4YauG1HkkVjAynJhbZIeq7Xlus6koai6bDjv6-fIy23cg5dgaNTlC3fF_rUxsClOqsGvsvVFSxzw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0IHvSECsbf9uDRDta1G_NmcAQUiFFIuJH2WxuMZiM6Lv71tvsB0XjwtGXrkqZb1-97fe97CF1TCdANAiAxgCZMMEWkkoJwXwrpMnBjlRNkJ_5gxh7mfF5DNxstjFIqJ58px57me_lxCmsLlbVDG_3aGp873IpxC7XWFlHJDU15KQN2O2H7rjcccpNTeCYPpK5TPv7DRyVfRvoNNK46ULBH3px1Jh34-lWb8b893EetrWAPP22WogNUU8khalSODbicwE2koiQmWUrMAb-slIIljgofH_xcMYnS5Bb3KouVT5xqc0u8k9GrVjiqxJq4ZxpgCw2b8BHfi0zgIpG1wHsLzfrRtDcgpdECWdIuz4gwSSRITwfAQYLLNYSx9MBzBfMszqQl7fqCh4yCx70AXGBm6psWIFnHElSPUD1JE3WMsBQelcz8CEwYyHxNTYKtOkL6Hc2oDnl8gpp21BaropbGohyw078vX6HdwXQ8WoyGk8cztGffouVq0O45qmcfa3VhAoJMXubfwTfI6LUX
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Affective+Computing+and+Intelligent+Interaction+and+workshops&rft.atitle=End-to-End+Speech+Emotion+Recognition%3A+Challenges+of+Real-Life+Emergency+Call+Centers+Data+Recordings&rft.au=Deschamps-Berger%2C+Theo&rft.au=Lamel%2C+Lori&rft.au=Devillers%2C+Laurence&rft.date=2021-09-28&rft.pub=IEEE&rft.eissn=2156-8111&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FACII52823.2021.9597419&rft.externalDocID=9597419