Speech emotion recognition using optimized genetic algorithm-extreme learning machine

Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused o...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 81; no. 17; pp. 23963 - 23989
Main Authors Albadr, Musatafa Abbas Abbood, Tiun, Sabrina, Ayob, Masri, AL-Dhief, Fahad Taha, Omar, Khairuddin, Maen, Mhd Khaled
Format Journal Article
LanguageEnglish
Published New York Springer US 01.07.2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored the classification part. Whilst the classification process is considered an essential part in ESR systems, where its role is to map out the extracted features from audio samples to determine its corresponding emotion. Moreover, the evaluation of most ESR systems has been conducted based on Subject Independent (SI) scenario only. Therefore, in this paper, we are focusing on the Back-End (classification), where we have adopted our recent developed Extreme Learning Machine (ELM), called Optimized Genetic Algorithm-Extreme Learning Machine (OGA-ELM). In addition, we used the Mel Frequency Cepstral Coefficients (MFCC) method in order to extract the features from the speech utterances. This work proves the significance of the classification part in ESR systems, where it improves the ESR performance in terms of achieving higher accuracy. The performance of the proposed model was evaluated based on Berlin Emotional Speech (BES) dataset which consists of 7 emotions (neutral, happiness, boredom, anxiety, sadness, anger, and disgust). Four different evaluation scenarios have been conducted such as Subject Dependent (SD), SI, Gender Dependent Female (GD-Female), and Gender Dependent Male (GD-Male). The highest performance of the OGA-ELM was very impressive in the four different scenarios and achieved an accuracy of 93.26%, 100.00%, 96.14% and 97.10% for SI, SD, GD-Male, and GD-Female scenarios, respectively. Besides, the proposed ESR system has shown a fast execution time in all experiments to identify the emotions.
ISSN:1380-7501
1573-7721
1573-7721
DOI:10.1007/s11042-022-12747-w