A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech
The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algo...
Saved in:
Published in | Electronics (Basel) Vol. 12; no. 19; p. 4034 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algorithms for visualization, effectively outlining emotion-specific audio features. As a model for emotion recognition, we propose a new model architecture that combines the bidirectional long short-term memory (BiLSTM)–Transformer and a 2D convolutional neural network (CNN). The BiLSTM–Transformer processes audio features to capture the sequence of speech patterns, while the 2D CNN handles Mel-Spectrograms to capture the spatial details of audio. To validate the proficiency of the model, the 10-fold cross-validation method is used. The methodology proposed in this study was applied to Emo-DB and RAVDESS, two major emotion recognition from speech databases, and achieved high unweighted accuracy rates of 95.65% and 80.19%, respectively. These results indicate that the use of the proposed transformer-based deep learning model with appropriate feature selection can enhance performance in emotion recognition from speech. |
---|---|
AbstractList | The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algorithms for visualization, effectively outlining emotion-specific audio features. As a model for emotion recognition, we propose a new model architecture that combines the bidirectional long short-term memory (BiLSTM)–Transformer and a 2D convolutional neural network (CNN). The BiLSTM–Transformer processes audio features to capture the sequence of speech patterns, while the 2D CNN handles Mel-Spectrograms to capture the spatial details of audio. To validate the proficiency of the model, the 10-fold cross-validation method is used. The methodology proposed in this study was applied to Emo-DB and RAVDESS, two major emotion recognition from speech databases, and achieved high unweighted accuracy rates of 95.65% and 80.19%, respectively. These results indicate that the use of the proposed transformer-based deep learning model with appropriate feature selection can enhance performance in emotion recognition from speech. |
Author | Kim, Sera Lee, Seok-Pil |
Author_xml | – sequence: 1 givenname: Sera surname: Kim fullname: Kim, Sera – sequence: 2 givenname: Seok-Pil orcidid: 0000-0003-2520-6681 surname: Lee fullname: Lee, Seok-Pil |
BookMark | eNptUMtOAkEQnBhMROQLvEzieXUe-5ojIj4SxATwvJltemUJO4M9y8Gb_-Af-iWu4sGDdelKulKVqlPWc94hY-dSXGptxBVuEVryroYglTSx0PER6yuRmcgoo3p_-AkbhrARHYzUuRZ9Nh_x63q6WD5-vn8sybpQeWqQuHUrrm74eDbjI4J13XYZe0Levfmk8W3tHZ8j-BdX__CKfMMXO0RYn7Hjym4DDn_vgD3fTpbj-2j6dPcwHk0j0Eq1kY7tKhEGAQyIVFZ5qaw0Ns3SKpNGAJQ2FyouIbOYJHEu0tJkMllJq1IoUeoBuzj47si_7jG0xcbvyXWRhcqzVBvV9e5U-qAC8iEQVsWO6sbSWyFF8b1f8c9--gsZNmf1 |
CitedBy_id | crossref_primary_10_3390_electronics12234859 crossref_primary_10_3390_electronics12234779 crossref_primary_10_1016_j_apacoust_2024_109886 |
Cites_doi | 10.3844/jcssp.2018.1577.1587 10.1016/j.bspc.2020.101894 10.1109/ICREST.2019.8644168 10.3390/s20226688 10.1007/978-3-319-70772-3_1 10.1016/j.neucom.2023.01.002 10.1371/journal.pone.0196391 10.3390/s18020401 10.1109/ICACDOT.2016.7877753 10.1016/j.dsp.2012.05.007 10.18653/v1/P19-1285 10.1016/j.ins.2021.10.005 10.1109/WASPAA.2013.6701819 10.1109/JSTSP.2011.2112333 10.1109/ACCESS.2022.3163856 10.3390/s20185212 10.1109/APSIPA.2016.7820699 10.5772/intechopen.84856 10.1109/TASSP.1980.1163420 10.1016/j.apacoust.2020.107360 10.1109/ICCE53296.2022.9730534 10.21437/Interspeech.2019-2753 |
ContentType | Journal Article |
Copyright | 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 7SP 8FD 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L7M P5Z P62 PIMPY PQEST PQQKQ PQUKI |
DOI | 10.3390/electronics12194034 |
DatabaseName | CrossRef Electronics & Communications Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central Advanced Technologies & Aerospace Database (1962 - current) ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition |
DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection Technology Research Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition Electronics & Communications Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest One Academic Advanced Technologies Database with Aerospace |
DatabaseTitleList | CrossRef Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2079-9292 |
ExternalDocumentID | 10_3390_electronics12194034 |
GroupedDBID | 5VS 8FE 8FG AAYXX AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS BENPR BGLVJ CCPQU CITATION GROUPED_DOAJ HCIFZ IAO ITC KQ8 MODMG M~E OK1 P62 PIMPY PROAC 7SP 8FD ABUWG AZQEC DWQXO L7M PQEST PQQKQ PQUKI |
ID | FETCH-LOGICAL-c322t-34ad509ecc9c061f8b2a19a676f7190ccba8024bc7ae554806b9715d1a26cbe13 |
IEDL.DBID | 8FG |
ISSN | 2079-9292 |
IngestDate | Sat Nov 09 11:39:38 EST 2024 Fri Aug 23 02:37:09 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 19 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c322t-34ad509ecc9c061f8b2a19a676f7190ccba8024bc7ae554806b9715d1a26cbe13 |
ORCID | 0000-0003-2520-6681 |
OpenAccessLink | https://www.proquest.com/docview/2876392207?pq-origsite=%requestingapplication% |
PQID | 2876392207 |
PQPubID | 2032404 |
ParticipantIDs | proquest_journals_2876392207 crossref_primary_10_3390_electronics12194034 |
PublicationCentury | 2000 |
PublicationDate | 2023-10-01 |
PublicationDateYYYYMMDD | 2023-10-01 |
PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Basel |
PublicationPlace_xml | – name: Basel |
PublicationTitle | Electronics (Basel) |
PublicationYear | 2023 |
Publisher | MDPI AG |
Publisher_xml | – name: MDPI AG |
References | Burkhardt (ref_4) 2005; 5 Chen (ref_28) 2012; 22 Praseetha (ref_34) 2018; 14 ref_13 ref_35 ref_12 ref_11 Peeters (ref_37) 2004; 54 ref_10 ref_32 ref_31 Daneshfar (ref_30) 2020; 166 Radford (ref_14) 2019; 1 ref_19 ref_18 ref_16 ref_38 ref_15 Muller (ref_36) 2011; 5 Hinton (ref_39) 2008; 9 Milton (ref_33) 2013; 69 ref_25 ref_23 ref_22 ref_21 ref_1 ref_3 ref_29 Jing (ref_20) 2021; 37 ref_26 (ref_5) 2023; 528 ref_9 Brown (ref_17) 2020; 33 ref_8 Canal (ref_2) 2022; 582 ref_7 Andayani (ref_24) 2022; 10 Davis (ref_27) 1980; 28 ref_6 |
References_xml | – volume: 14 start-page: 1577 year: 2018 ident: ref_34 article-title: Deep learning models for speech emotion recognition publication-title: J. Comput. Sci. doi: 10.3844/jcssp.2018.1577.1587 contributor: fullname: Praseetha – volume: 54 start-page: 1 year: 2004 ident: ref_37 article-title: A large set of audio features for sound description (similarity and classification) in the CUIDADO project publication-title: CUIDADO Ist Proj. Rep. contributor: fullname: Peeters – ident: ref_7 doi: 10.1016/j.bspc.2020.101894 – volume: 33 start-page: 1877 year: 2020 ident: ref_17 article-title: Language models are few-shot learners publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Brown – ident: ref_32 – ident: ref_3 – ident: ref_26 – volume: 9 start-page: 2579 year: 2008 ident: ref_39 article-title: Visualizing data using t-SNE publication-title: J. Mach. Learn. Res. contributor: fullname: Hinton – ident: ref_35 doi: 10.1109/ICREST.2019.8644168 – ident: ref_16 – ident: ref_19 doi: 10.3390/s20226688 – volume: 1 start-page: 9 year: 2019 ident: ref_14 article-title: Language models are unsupervised multitask learners publication-title: OpenAI blog contributor: fullname: Radford – ident: ref_18 – ident: ref_31 doi: 10.1007/978-3-319-70772-3_1 – ident: ref_23 – volume: 37 start-page: 164 year: 2021 ident: ref_20 article-title: Transformer-like model with linear attention for speech emotion recognition publication-title: J. Southeast Univ. contributor: fullname: Jing – volume: 69 start-page: 34 year: 2013 ident: ref_33 article-title: SVM scheme for speech emotion recognition using MFCC feature publication-title: Int. J. Comput. Appl. contributor: fullname: Milton – volume: 528 start-page: 1 year: 2023 ident: ref_5 article-title: An ongoing review of speech emotion recognition publication-title: Neurocomputing doi: 10.1016/j.neucom.2023.01.002 – volume: 5 start-page: 1517 year: 2005 ident: ref_4 article-title: A database of German emotional speech publication-title: Interspeech contributor: fullname: Burkhardt – ident: ref_6 – ident: ref_25 doi: 10.1371/journal.pone.0196391 – ident: ref_1 doi: 10.3390/s18020401 – ident: ref_29 doi: 10.1109/ICACDOT.2016.7877753 – volume: 22 start-page: 1154 year: 2012 ident: ref_28 article-title: Speech emotion recognition: Features and classification models publication-title: Digit. Signal Process. doi: 10.1016/j.dsp.2012.05.007 contributor: fullname: Chen – ident: ref_21 doi: 10.18653/v1/P19-1285 – volume: 582 start-page: 593 year: 2022 ident: ref_2 article-title: A survey on facial emotion recognition techniques: A state-of-the-art literature review publication-title: Inf. Sci. doi: 10.1016/j.ins.2021.10.005 contributor: fullname: Canal – ident: ref_38 doi: 10.1109/WASPAA.2013.6701819 – volume: 5 start-page: 1088 year: 2011 ident: ref_36 article-title: Signal processing for music analysis publication-title: IEEE J. Sel. Top. Signal Process. doi: 10.1109/JSTSP.2011.2112333 contributor: fullname: Muller – volume: 10 start-page: 36018 year: 2022 ident: ref_24 article-title: Hybrid LSTM-transformer model for emotion recognition from speech audio files publication-title: IEEE Access doi: 10.1109/ACCESS.2022.3163856 contributor: fullname: Andayani – ident: ref_15 – ident: ref_13 – ident: ref_10 doi: 10.3390/s20185212 – ident: ref_9 doi: 10.1109/APSIPA.2016.7820699 – ident: ref_22 – ident: ref_12 doi: 10.5772/intechopen.84856 – volume: 28 start-page: 357 year: 1980 ident: ref_27 article-title: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences publication-title: IEEE Trans. Acoust. Speech Signal Process. doi: 10.1109/TASSP.1980.1163420 contributor: fullname: Davis – volume: 166 start-page: 107360 year: 2020 ident: ref_30 article-title: Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier publication-title: Appl. Acoust. doi: 10.1016/j.apacoust.2020.107360 contributor: fullname: Daneshfar – ident: ref_8 doi: 10.1109/ICCE53296.2022.9730534 – ident: ref_11 doi: 10.21437/Interspeech.2019-2753 |
SSID | ssj0000913830 |
Score | 2.3400433 |
Snippet | The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand... |
SourceID | proquest crossref |
SourceType | Aggregation Database |
StartPage | 4034 |
SubjectTerms | Accuracy Algorithms Artificial intelligence Artificial neural networks Deep learning Emotion recognition Emotions Machine learning Neural networks Spectrograms Speech Speech recognition Transformers |
Title | A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech |
URI | https://www.proquest.com/docview/2876392207 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwELagXWBAPEWhVB4Ysdo4iR1PqC0pFaIR6kPqFvkpWNLSlhXxH_iH_BLsJoFWQmyJLHm4093nO5-_D4BrIozAgVJIKkxQwAxDwnCOVCRpxLUikXLvnQcJ6U-Ch2k4LRpuy2KsssyJ60StZtL1yJvYUacxjFv0dv6KnGqUu10tJDR2QdXDlLriK-rd__RYHOdl5LdysiHfVvfNX22ZpWdjNWj5wTYgbefjNcj0DsFBcTqE7dydR2BHZ8dgf4Mz8AQM27Dz8jgaD74-PsflqVMvIM8UxHewmySwvXE5AO0yjHOtHjgsp4Xst3tXAkdzreXzKZj04nG3jwphBCRt_K2QH3Blgd5an0mLxyYSmHuME0oMtQAvpeCRxV4hKdehI3QjglEvVB7HRArt-Wegks0yfQ4g5ZwZo6jS2gShpoJwHuiW3YhaAylWAzelddJ5zn-R2rrBGTP9w5g1UC8tmBbBsEx_XXfx__Il2HNq7vmsXB1UVos3fWUxfyUaa8c2QLUTJ09D-zd4j78B_yuytg |
link.rule.ids | 315,783,787,12777,21400,27936,27937,33385,33756,43612,43817,74363,74630 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV27TsMwFLWgDMCAeIpCAQ-MWG0Sx44nVEpLgTZDm0pskZ-CJS1t2fkH_pAvwW4S2kqILZKlDOfq-tjX954DwDURRvhYKSSVTxBmhiFhOEcqkjTiWpFIuXnnfky6I_z0Er4UBbdZ0VZZ7omLjVqNpauR130nncZ8v0FvJ-_IuUa519XCQmMTbOHAcrWbFO88_NZYnOZlFDRysaHA3u7rS2-ZmWdzFTcCvE5I6_vxgmQ6-2CvOB3CZh7OA7Chs0Owu6IZeAQGTXj31hsm_e_Pr6Q8deop5JmC_j1sxTFsrjwOQLsM27lXDxyU3UL2282VwOFEa_l6DEaddtLqosIYAUmbf3MUYK4s0Vv0mbR8bCLhc49xQomhluClFDyy3Csk5Tp0gm5EMOqFyuM-kUJ7wQmoZONMnwJIOWfGKKq0NjjUVBDOsW7YH1ELkGJVcFOik05y_YvU3hscmOkfYFZBrUQwLZJhli5Dd_b_8hXY7ib9Xtp7jJ_PwY5zds_75mqgMp9-6AvL_3NxuQjyD6Bbss0 |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3LTgIxFG0UE6ML4zOiqF24tGGe7XRlEBhRYWJ4JO4mfUY3AwLu_Qf_0C-xZWYEEuOuSZMu7u2957a9PQeAa8w19wIpkZAeRgHVFHHNGJKRIBFTEkfS_nfuJbgzCh5fwpei_2lWtFWWOXGRqOVY2Dvyumep06jnOaSui7aI51Z8O3lHVkHKvrQWchqbYMugIrZ7Porvf-9bLP9l5Ds58ZBvTvr1pc7MzDVxGzh-sA5O67l5ATjxPtgrKkXYyF17ADZUdgh2V_gDj0C_Ae_euoNh7_vza1hWoGoKWSah14LNJIGNlYcCaKZhO9ftgf2yc8iM7R8TOJgoJV6PwShuD5sdVIgkIGFicY78gEkD-sYTVBhs1hH3mEsZJlgTA_ZCcBYZHOaCMBVacjfMKXFD6TIPC65c_wRUsnGmTgEkjFGtJZFK6SBUhGPGAuWYhYgxkKRVcFNaJ53kXBipOUNYY6Z_GLMKaqUF0yIwZunSjWf_T1-BbePftPuQPJ2DHSvynrfQ1UBlPv1QF6YUmPPLhY9_AKA3twU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+BiLSTM%E2%80%93Transformer+and+2D+CNN+Architecture+for+Emotion+Recognition+from+Speech&rft.jtitle=Electronics+%28Basel%29&rft.au=Kim%2C+Sera&rft.au=Lee%2C+Seok-Pil&rft.date=2023-10-01&rft.issn=2079-9292&rft.eissn=2079-9292&rft.volume=12&rft.issue=19&rft.spage=4034&rft_id=info:doi/10.3390%2Felectronics12194034&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_electronics12194034 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2079-9292&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2079-9292&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2079-9292&client=summon |