Performance Comparison of Different Cepstral Features for Speech Emotion Recognition

Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB)...

Full description

Saved in:

Bibliographic Details
Published in	2018 International CET Conference on Control, Communication, and Computing (IC4) pp. 266 - 271
Main Authors	Sugan, N., Sai Srinivas, N. S., Kar, Niladri, Kumar, L. S., Nath, Malaya Kumar, Kanhe, Aniruddha
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2018
Subjects	artificial neural networks (ANN) Emotion recognition Feature extraction gammatone frequency cepstral coefficients (GFCC) human factor cepstral coefficients (HFCC) Human factors Mel frequency cepstral coefficient mel frequency cepstral coefficients (MFCC) Speech emotion recognition (SER) Speech recognition support vector machines (SVM)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Speech emotion recognition (SER) system is one of the most important building block in this age of technology, where, the human-computer interaction plays a very indispensable role. In this work, emotional speech samples are taken from two databases namely, Berlin emotional speech database (Emo-DB) and surrey audio-visual expressed emotion speech database (SAVEE). Three different cepstral features like mel-frequency cepstral coefficients (MFCC), human factor cepstral coefficients (HFCC) and gammatone frequency cepstral coefficients (GFCC) are extracted from the emotional speech samples. These features are used for training, validating and testing the classifier. The extracted features represent the emotional content present in the speech signal. Two classifiers namely, the feedforward backpropagation artificial neural network (FF-BP-ANN) and support vector machine (SVM) are used for developing SERs. These classifiers are trained to classify the input speech signals into any one emotion among the distinct emotional classes corresponding to anger, bordem, disgust, fear, happiness, neutral, sadness and surprise. The results corresponding to the usage of three different cepstral features in accurately recognizing the emotions from speech utterances of two databases are presented. Finally, the performance comparisons of SER systems are made with respect to features, classifiers and from existing literature.
DOI:	10.1109/CETIC4.2018.8531065