Speech emotion recognition using optimized genetic algorithm-extreme learning machine
Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused o...
Saved in:
Published in | Multimedia tools and applications Vol. 81; no. 17; pp. 23963 - 23989 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.07.2022
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored the classification part. Whilst the classification process is considered an essential part in ESR systems, where its role is to map out the extracted features from audio samples to determine its corresponding emotion. Moreover, the evaluation of most ESR systems has been conducted based on Subject Independent (SI) scenario only. Therefore, in this paper, we are focusing on the Back-End (classification), where we have adopted our recent developed Extreme Learning Machine (ELM), called Optimized Genetic Algorithm-Extreme Learning Machine (OGA-ELM). In addition, we used the Mel Frequency Cepstral Coefficients (MFCC) method in order to extract the features from the speech utterances. This work proves the significance of the classification part in ESR systems, where it improves the ESR performance in terms of achieving higher accuracy. The performance of the proposed model was evaluated based on Berlin Emotional Speech (BES) dataset which consists of 7 emotions (neutral, happiness, boredom, anxiety, sadness, anger, and disgust). Four different evaluation scenarios have been conducted such as Subject Dependent (SD), SI, Gender Dependent Female (GD-Female), and Gender Dependent Male (GD-Male). The highest performance of the OGA-ELM was very impressive in the four different scenarios and achieved an accuracy of 93.26%, 100.00%, 96.14% and 97.10% for SI, SD, GD-Male, and GD-Female scenarios, respectively. Besides, the proposed ESR system has shown a fast execution time in all experiments to identify the emotions. |
---|---|
ISSN: | 1380-7501 1573-7721 1573-7721 |
DOI: | 10.1007/s11042-022-12747-w |