Performance Comparison of Different Cepstral Features for Speech Emotion Recognition

Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB)...

Full description

Saved in:

Bibliographic Details
Published in	2018 International CET Conference on Control, Communication, and Computing (IC4) pp. 266 - 271
Main Authors	Sugan, N., Sai Srinivas, N. S., Kar, Niladri, Kumar, L. S., Nath, Malaya Kumar, Kanhe, Aniruddha
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2018
Subjects	artificial neural networks (ANN) Emotion recognition Feature extraction gammatone frequency cepstral coefficients (GFCC) human factor cepstral coefficients (HFCC) Human factors Mel frequency cepstral coefficient mel frequency cepstral coefficients (MFCC) Speech emotion recognition (SER) Speech recognition support vector machines (SVM)
Online Access	Get full text

Cover

Loading…

Abstract	Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB) and surrey audio-visual expressed emotion speech database (SAVEE). Three different cepstral features like mel-frequency cepstral coefficients (MFCC), human factor cepstral coefficients (HFCC) and gammatone frequency cepstral coefficients (GFCC) are extracted from the emotional speech samples. These features are used for training, validating and testing the classifier. The extracted features represent the emotional content present in the speech signal. Two classifiers namely, the feedforward backpropagation artificial neural network (FF-BP-ANN) and support vector machine (SVM) are used for developing SERs. These classifiers are trained to classify the input speech signals into any one emotion among the distinct emotional classes corresponding to anger, bordem, disgust, fear, happiness, neutral, sadness and surprise. The results corresponding to the usage of three different cepstral features in accurately recognizing the emotions from speech utterances of two databases are presented. Finally, the performance comparisons of SER systems are made with respect to features, classifiers and from existing literature.
AbstractList	Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB) and surrey audio-visual expressed emotion speech database (SAVEE). Three different cepstral features like mel-frequency cepstral coefficients (MFCC), human factor cepstral coefficients (HFCC) and gammatone frequency cepstral coefficients (GFCC) are extracted from the emotional speech samples. These features are used for training, validating and testing the classifier. The extracted features represent the emotional content present in the speech signal. Two classifiers namely, the feedforward backpropagation artificial neural network (FF-BP-ANN) and support vector machine (SVM) are used for developing SERs. These classifiers are trained to classify the input speech signals into any one emotion among the distinct emotional classes corresponding to anger, bordem, disgust, fear, happiness, neutral, sadness and surprise. The results corresponding to the usage of three different cepstral features in accurately recognizing the emotions from speech utterances of two databases are presented. Finally, the performance comparisons of SER systems are made with respect to features, classifiers and from existing literature.
Author	Nath, Malaya Kumar Kar, Niladri Sugan, N. Kumar, L. S. Kanhe, Aniruddha Sai Srinivas, N. S.
Author_xml	– sequence: 1 givenname: N. surname: Sugan fullname: Sugan, N. organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal – sequence: 2 givenname: N. S. surname: Sai Srinivas fullname: Sai Srinivas, N. S. organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal – sequence: 3 givenname: Niladri surname: Kar fullname: Kar, Niladri organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal – sequence: 4 givenname: L. S. surname: Kumar fullname: Kumar, L. S. organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal – sequence: 5 givenname: Malaya Kumar surname: Nath fullname: Nath, Malaya Kumar organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal – sequence: 6 givenname: Aniruddha surname: Kanhe fullname: Kanhe, Aniruddha organization: Department of Electronics and Communication Engineering, National Institute of Technology Puducherry Karaikal, Thiruvettakudy, Karaikal
BookMark	eNotj71OwzAURo0EA5Q-QRe_QIL_4tgjMi1UqlQEYa5s916w1MSREwbeniI6nW85n3TuyPWQByBkxVnNObMPbt1tnaoF46Y2jeRMN1dkaVvDG2m0slqrW9K9QsFcej9EoC73oy9pygPNSJ8SIhQYZupgnObiT3QDfv4uMNGzQt9HgPhF132e09l4g5g_h_S378kN-tMEywsX5GOz7txLtds_b93jroqC6bkKAWwQWiBXFjh6YK2WzTFq5IjKBGxly0P0mseAQhwFSsMsWGFMEEpyuSCr_98EAIexpN6Xn8OlVf4CnRhPDg
CitedBy_id	crossref_primary_10_2139_ssrn_3869462 crossref_primary_10_1109_TIM_2023_3252631
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/CETIC4.2018.8531065
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781538649664 1538649667
EndPage	271
ExternalDocumentID	8531065
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-c206t-bbe9b262f149e1fae07635dc6f1ff48bf7371bca61cbf22d2f3809e9288b24313
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:39:12 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c206t-bbe9b262f149e1fae07635dc6f1ff48bf7371bca61cbf22d2f3809e9288b24313
PageCount	6
ParticipantIDs	ieee_primary_8531065
PublicationCentury	2000
PublicationDate	2018-07
PublicationDateYYYYMMDD	2018-07-01
PublicationDate_xml	– month: 07 year: 2018 text: 2018-07
PublicationDecade	2010
PublicationTitle	2018 International CET Conference on Control, Communication, and Computing (IC4)
PublicationTitleAbbrev	CETIC4
PublicationYear	2018
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.7688848
Snippet	Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a...
SourceID	ieee
SourceType	Publisher
StartPage	266
SubjectTerms	artificial neural networks (ANN) Emotion recognition Feature extraction gammatone frequency cepstral coefficients (GFCC) human factor cepstral coefficients (HFCC) Human factors Mel frequency cepstral coefficient mel frequency cepstral coefficients (MFCC) Speech emotion recognition (SER) Speech recognition support vector machines (SVM)
Title	Performance Comparison of Different Cepstral Features for Speech Emotion Recognition
URI	https://ieeexplore.ieee.org/document/8531065
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEB1qT55UWvGbPXg0aXabbLLn2FKFStEWeiuZzSyCkBRJLv56d5O0RfHgKUvYJWE2YebNvpkHcC9DiZG0kZtBi03sR5J7SBh5ucbIXikk5aqR5y9ytgqf19G6Bw_7Whgiashn5Lthc5afl7p2qbKRdS0WwURHcBQr1dZqdY2EeKBG6WT5lLo0CU_8buYPyZTGY0xPYL57VksU-fDrCn399asN439f5hSGh9o8tth7nTPoUTGA5eLA_2fpXlqQlYY9dgooFUtp2-Q1mAv7aguzmV3C3rZE-p1NWjkf9rojFJXFEFbTyTKdeZ1egqdFICsP0RpWSGEs6iFuMgpct7lcS8ONCRM08TjmqDPJNRohcmHGSaBIiSRBYQOJ8Tn0i7KgC2AUaUOZyXgchGGuBGbGgrWQC2n_Z5XElzBwFtls25YYm84YV3_fvoZjtysty_UG-tVnTbfWl1d412ziNxOoolY
link.rule.ids	310,311,783,787,792,793,799,27937,55086
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG7mPOhJzWb8bQ8ehdEOCpxxy6bbsihLdltoeY2JCRADF_96X4Ft0XjwREPaQF4h732v73sfIQ_CFdITGLlpidgEP5LUkiA9K1XSwyu4EBo28nwhJiv3ee2tO-Rxx4UBgLr4DGwzrM_y01xVJlU2QNeCCMY7IIcYVweiYWu1rYSYEw6iUTyNTKKEBXY794doSu0zxidkvn1aUyryYVeltNXXr0aM_32dU9Lfs_Pocud3zkgHsh6Jl3sGAI124oI01_Sp1UApaQRFndmgJvCrEGhTXELfCgD1TkeNoA993ZYU5VmfrMajOJpYrWKCpbgjSktKNC0XXCPuAaYTcEy_uVQJzbR2A6n9oc-kSgRTUnOecj0MnBBCHgSSYygxPCfdLM_gglDwlIZEJ8x3XDcNuUw0wjWXcYF_dBj4l6RnLLIpmqYYm9YYV3_fvidHk3g-28ymi5drcmx2qKl5vSHd8rOCW_TspbyrN_QbvhaloQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+International+CET+Conference+on+Control%2C+Communication%2C+and+Computing+%28IC4%29&rft.atitle=Performance+Comparison+of+Different+Cepstral+Features+for+Speech+Emotion+Recognition&rft.au=Sugan%2C+N.&rft.au=Sai+Srinivas%2C+N.+S.&rft.au=Kar%2C+Niladri&rft.au=Kumar%2C+L.+S.&rft.date=2018-07-01&rft.pub=IEEE&rft.spage=266&rft.epage=271&rft_id=info:doi/10.1109%2FCETIC4.2018.8531065&rft.externalDocID=8531065