Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emo...
Saved in:
Published in | 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 6 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
18.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech. |
---|---|
AbstractList | In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech. |
Author | Sasou, Akira Atmaja, Bagus Tris Zanjabila |
Author_xml | – sequence: 1 givenname: Bagus Tris surname: Atmaja fullname: Atmaja, Bagus Tris email: b-atmaja@aist.go.jp organization: AIST,Tsukuba,Japan – sequence: 2 surname: Zanjabila fullname: Zanjabila email: zanjabilaabil@gmail.com organization: ITS,Surabaya,Indonesia – sequence: 3 givenname: Akira surname: Sasou fullname: Sasou, Akira email: a-sasou@aist.go.jp organization: AIST,Tsukuba,Japan |
BookMark | eNo1j8tOwzAURI0EC1r4Axb-gCZcP1svo6hAUCVaqRXLyrFvKkutjRJ3kb8nCFjN4swZaWbkNqaIhFAGJWNgnqu6aT7VkgtWcuC8ZAArZQy7ITOmtZJKGhD3ZPeeQsznkW579MHlEE90fUk5pLig1QkX1EZP63SNuR_pYfjhU7XY9zZE9LRy6Trk4CapRe8n_EDuOnse8PEv5-Twst7Xb8Xm47Wpq00ROMhcMO7AGWd9u_Kt9q3iTHSdsOCcEAy5AKlQcy_Bg9eKgV0a20mlbYtOdVrMydPvbkDE41cfLrYfj_83xTdyS02S |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ACIIW57231.2022.10085991 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1665454903 9781665454902 |
EndPage | 6 |
ExternalDocumentID | 10085991 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i204t-12c0c9cadb8db6db5213ff3a0cc331e23045e62d40d0d6510a79af456abec5f63 |
IEDL.DBID | RIE |
IngestDate | Thu Jan 18 11:14:29 EST 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i204t-12c0c9cadb8db6db5213ff3a0cc331e23045e62d40d0d6510a79af456abec5f63 |
PageCount | 6 |
ParticipantIDs | ieee_primary_10085991 |
PublicationCentury | 2000 |
PublicationDate | 2022-Oct.-18 |
PublicationDateYYYYMMDD | 2022-10-18 |
PublicationDate_xml | – month: 10 year: 2022 text: 2022-Oct.-18 day: 18 |
PublicationDecade | 2020 |
PublicationTitle | 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) |
PublicationTitleAbbrev | ACIIW |
PublicationYear | 2022 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.834862 |
Snippet | In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks:... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | acoustic embedding affective computing age prediction Conferences country prediction Emotion recognition Feature extraction Harmonic analysis Measurement multitask learning Predictive models speech emotion recognition Speech recognition |
Title | Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding |
URI | https://ieeexplore.ieee.org/document/10085991 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA26J59UnPhNHnxcujYfXfM4xsY2cChsuLeRr8pQWhndw_z15qaroiD4VtqkLbm09yQ55x6E7pXKeC6EIwnrUcJTKommUhCmbOYhv2JMg3b4YZaOF3y6FMu9WD1oYZxzgXzmIjgMe_m2NFtYKutCIRohQat-6GdutVirYefEstsfTCbPoucRi5_3URo1zX8Yp4S8MTpGs-aJNV3kNdpWOjIfv4ox_vuVTlD7W6KHH7-Szyk6cMUZepqW66J62_krsP8CjGY8rH16Orj_4jpYFRaDDL3a7HBgC0BTMgefCGdx35TB3Mt30s7CjdtoMRrOB2Oy90wgaxrziiTUxEYaZXVmdWq1z84sz5mKjWEscbAELFxKLY9tbFP_QaqeVLlHUcoHU-QpO0etoizcBcIeymjfVQrKc54yl7HMUDCIAVjBGLtEbRiP1XtdFmPVDMXVH-ev0RGEBX78SXaDWtVm6259Rq_0XYjkJ0BPoFs |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LSsNAFB2kLnSlYsW3s3DZSZN55LEspaWtbVFosbsyr0hREinpon69c5NGURDchSQ3GeYSzs3MOfcgdC9lzFMhLAlYRAkPaUIUTQRh0sSu5JeMKdAOT6bhYM5HC7HYidVLLYy1tiSfWQ8Oy718k-sNLJW1oRGNSECrvu-AXwSVXKvm5_hJu9MdDp9F5GoW9-dHqVcH_LBOKZGjf4Sm9TsrwsirtymUpz9-tWP896COUfNbpIcfv-DnBO3Z7BQ9jfJVVrxt3RXYgQFOM-5VTj0t3HmxLSwzg0GIXqy3uOQLwK1kBk4R1uCOzkt7LxekrIEHN9G835t1B2TnmkBW1OcFCaj2daKlUbFRoVEOn1maMulrzVhgYRFY2JAa7hvfhO6TlFEiU1dHSZdOkYbsDDWyPLPnCLtiRrnQRFCe8pDZmMWagkUMFBaMsQvUhPlYvleNMZb1VFz-cf4OHQxmk_FyPJw-XKFDSBHAQBBfo0ax3tgbh--Fui2z-gnP16Ok |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+10th+International+Conference+on+Affective+Computing+and+Intelligent+Interaction+Workshops+and+Demos+%28ACIIW%29&rft.atitle=Jointly+Predicting+Emotion%2C+Age%2C+and+Country+Using+Pre-Trained+Acoustic+Embedding&rft.au=Atmaja%2C+Bagus+Tris&rft.au=Zanjabila&rft.au=Sasou%2C+Akira&rft.date=2022-10-18&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FACIIW57231.2022.10085991&rft.externalDocID=10085991 |