Speech emotion recognition using optimized genetic algorithm-extreme learning machine

Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused o...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 81; no. 17; pp. 23963 - 23989
Main Authors	Albadr, Musatafa Abbas Abbood, Tiun, Sabrina, Ayob, Masri, AL-Dhief, Fahad Taha, Omar, Khairuddin, Maen, Mhd Khaled
Format	Journal Article
Language	English
Published	New York Springer US 01.07.2022 Springer Nature B.V
Subjects	Accuracy Artificial neural networks Boredom Classification Computer Communication Networks Computer Science Data Structures and Information Theory Emotion recognition Emotion speech recognition Emotions Feature extraction Females Genetic algorithms Human-computer interface Machine learning Males Mel frequency cepstral coefficients Multimedia Information Systems Optimized genetic algorithm-extreme learning machine Special Purpose and Application-Based Systems Speech recognition Optimized genetic algorithm-extreme learning machine Emotion speech recognition Mel frequency cepstral coefficients
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored the classification part. Whilst the classification process is considered an essential part in ESR systems, where its role is to map out the extracted features from audio samples to determine its corresponding emotion. Moreover, the evaluation of most ESR systems has been conducted based on Subject Independent (SI) scenario only. Therefore, in this paper, we are focusing on the Back-End (classification), where we have adopted our recent developed Extreme Learning Machine (ELM), called Optimized Genetic Algorithm-Extreme Learning Machine (OGA-ELM). In addition, we used the Mel Frequency Cepstral Coefficients (MFCC) method in order to extract the features from the speech utterances. This work proves the significance of the classification part in ESR systems, where it improves the ESR performance in terms of achieving higher accuracy. The performance of the proposed model was evaluated based on Berlin Emotional Speech (BES) dataset which consists of 7 emotions (neutral, happiness, boredom, anxiety, sadness, anger, and disgust). Four different evaluation scenarios have been conducted such as Subject Dependent (SD), SI, Gender Dependent Female (GD-Female), and Gender Dependent Male (GD-Male). The highest performance of the OGA-ELM was very impressive in the four different scenarios and achieved an accuracy of 93.26%, 100.00%, 96.14% and 97.10% for SI, SD, GD-Male, and GD-Female scenarios, respectively. Besides, the proposed ESR system has shown a fast execution time in all experiments to identify the emotions.
ISSN:	1380-7501 1573-7721 1573-7721
DOI:	10.1007/s11042-022-12747-w