Performance Comparison of Different Cepstral Features for Speech Emotion Recognition

Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB)...

Full description

Saved in:
Bibliographic Details
Published in2018 International CET Conference on Control, Communication, and Computing (IC4) pp. 266 - 271
Main Authors Sugan, N., Sai Srinivas, N. S., Kar, Niladri, Kumar, L. S., Nath, Malaya Kumar, Kanhe, Aniruddha
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB) and surrey audio-visual expressed emotion speech database (SAVEE). Three different cepstral features like mel-frequency cepstral coefficients (MFCC), human factor cepstral coefficients (HFCC) and gammatone frequency cepstral coefficients (GFCC) are extracted from the emotional speech samples. These features are used for training, validating and testing the classifier. The extracted features represent the emotional content present in the speech signal. Two classifiers namely, the feedforward backpropagation artificial neural network (FF-BP-ANN) and support vector machine (SVM) are used for developing SERs. These classifiers are trained to classify the input speech signals into any one emotion among the distinct emotional classes corresponding to anger, bordem, disgust, fear, happiness, neutral, sadness and surprise. The results corresponding to the usage of three different cepstral features in accurately recognizing the emotions from speech utterances of two databases are presented. Finally, the performance comparisons of SER systems are made with respect to features, classifiers and from existing literature.
AbstractList Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB) and surrey audio-visual expressed emotion speech database (SAVEE). Three different cepstral features like mel-frequency cepstral coefficients (MFCC), human factor cepstral coefficients (HFCC) and gammatone frequency cepstral coefficients (GFCC) are extracted from the emotional speech samples. These features are used for training, validating and testing the classifier. The extracted features represent the emotional content present in the speech signal. Two classifiers namely, the feedforward backpropagation artificial neural network (FF-BP-ANN) and support vector machine (SVM) are used for developing SERs. These classifiers are trained to classify the input speech signals into any one emotion among the distinct emotional classes corresponding to anger, bordem, disgust, fear, happiness, neutral, sadness and surprise. The results corresponding to the usage of three different cepstral features in accurately recognizing the emotions from speech utterances of two databases are presented. Finally, the performance comparisons of SER systems are made with respect to features, classifiers and from existing literature.
Author Nath, Malaya Kumar
Kar, Niladri
Sugan, N.
Kumar, L. S.
Kanhe, Aniruddha
Sai Srinivas, N. S.
Author_xml – sequence: 1
  givenname: N.
  surname: Sugan
  fullname: Sugan, N.
  organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
– sequence: 2
  givenname: N. S.
  surname: Sai Srinivas
  fullname: Sai Srinivas, N. S.
  organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
– sequence: 3
  givenname: Niladri
  surname: Kar
  fullname: Kar, Niladri
  organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
– sequence: 4
  givenname: L. S.
  surname: Kumar
  fullname: Kumar, L. S.
  organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
– sequence: 5
  givenname: Malaya Kumar
  surname: Nath
  fullname: Nath, Malaya Kumar
  organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
– sequence: 6
  givenname: Aniruddha
  surname: Kanhe
  fullname: Kanhe, Aniruddha
  organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
BookMark eNotj71OwzAURo0EA5Q-QRe_QIL_4tgjMi1UqlQEYa5s916w1MSREwbeniI6nW85n3TuyPWQByBkxVnNObMPbt1tnaoF46Y2jeRMN1dkaVvDG2m0slqrW9K9QsFcej9EoC73oy9pygPNSJ8SIhQYZupgnObiT3QDfv4uMNGzQt9HgPhF132e09l4g5g_h_S378kN-tMEywsX5GOz7txLtds_b93jroqC6bkKAWwQWiBXFjh6YK2WzTFq5IjKBGxly0P0mseAQhwFSsMsWGFMEEpyuSCr_98EAIexpN6Xn8OlVf4CnRhPDg
CitedBy_id crossref_primary_10_2139_ssrn_3869462
crossref_primary_10_1109_TIM_2023_3252631
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CETIC4.2018.8531065
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781538649664
1538649667
EndPage 271
ExternalDocumentID 8531065
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-c206t-bbe9b262f149e1fae07635dc6f1ff48bf7371bca61cbf22d2f3809e9288b24313
IEDL.DBID RIE
IngestDate Thu Jun 29 18:39:12 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c206t-bbe9b262f149e1fae07635dc6f1ff48bf7371bca61cbf22d2f3809e9288b24313
PageCount 6
ParticipantIDs ieee_primary_8531065
PublicationCentury 2000
PublicationDate 2018-07
PublicationDateYYYYMMDD 2018-07-01
PublicationDate_xml – month: 07
  year: 2018
  text: 2018-07
PublicationDecade 2010
PublicationTitle 2018 International CET Conference on Control, Communication, and Computing (IC4)
PublicationTitleAbbrev CETIC4
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7688848
Snippet Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a...
SourceID ieee
SourceType Publisher
StartPage 266
SubjectTerms artificial neural networks (ANN)
Emotion recognition
Feature extraction
gammatone frequency cepstral coefficients (GFCC)
human factor cepstral coefficients (HFCC)
Human factors
Mel frequency cepstral coefficient
mel frequency cepstral coefficients (MFCC)
Speech emotion recognition (SER)
Speech recognition
support vector machines (SVM)
Title Performance Comparison of Different Cepstral Features for Speech Emotion Recognition
URI https://ieeexplore.ieee.org/document/8531065
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEB1qT55UWvGbPXg0aXabbLLn2FKFStEWeiuZzSyCkBRJLv56d5O0RfHgKUvYJWE2YebNvpkHcC9DiZG0kZtBi03sR5J7SBh5ucbIXikk5aqR5y9ytgqf19G6Bw_7Whgiashn5Lthc5afl7p2qbKRdS0WwURHcBQr1dZqdY2EeKBG6WT5lLo0CU_8buYPyZTGY0xPYL57VksU-fDrCn399asN439f5hSGh9o8tth7nTPoUTGA5eLA_2fpXlqQlYY9dgooFUtp2-Q1mAv7aguzmV3C3rZE-p1NWjkf9rojFJXFEFbTyTKdeZ1egqdFICsP0RpWSGEs6iFuMgpct7lcS8ONCRM08TjmqDPJNRohcmHGSaBIiSRBYQOJ8Tn0i7KgC2AUaUOZyXgchGGuBGbGgrWQC2n_Z5XElzBwFtls25YYm84YV3_fvoZjtysty_UG-tVnTbfWl1d412ziNxOoolY
link.rule.ids 310,311,783,787,792,793,799,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG7mPOhJzWb8bQ8ehdEOCpxxy6bbsihLdltoeY2JCRADF_96X4Ft0XjwREPaQF4h732v73sfIQ_CFdITGLlpidgEP5LUkiA9K1XSwyu4EBo28nwhJiv3ee2tO-Rxx4UBgLr4DGwzrM_y01xVJlU2QNeCCMY7IIcYVweiYWu1rYSYEw6iUTyNTKKEBXY794doSu0zxidkvn1aUyryYVeltNXXr0aM_32dU9Lfs_Pocud3zkgHsh6Jl3sGAI124oI01_Sp1UApaQRFndmgJvCrEGhTXELfCgD1TkeNoA993ZYU5VmfrMajOJpYrWKCpbgjSktKNC0XXCPuAaYTcEy_uVQJzbR2A6n9oc-kSgRTUnOecj0MnBBCHgSSYygxPCfdLM_gglDwlIZEJ8x3XDcNuUw0wjWXcYF_dBj4l6RnLLIpmqYYm9YYV3_fvidHk3g-28ymi5drcmx2qKl5vSHd8rOCW_TspbyrN_QbvhaloQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+International+CET+Conference+on+Control%2C+Communication%2C+and+Computing+%28IC4%29&rft.atitle=Performance+Comparison+of+Different+Cepstral+Features+for+Speech+Emotion+Recognition&rft.au=Sugan%2C+N.&rft.au=Sai+Srinivas%2C+N.+S.&rft.au=Kar%2C+Niladri&rft.au=Kumar%2C+L.+S.&rft.date=2018-07-01&rft.pub=IEEE&rft.spage=266&rft.epage=271&rft_id=info:doi/10.1109%2FCETIC4.2018.8531065&rft.externalDocID=8531065