Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emo...

Full description

Saved in:

Bibliographic Details
Published in	2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 6
Main Authors	Atmaja, Bagus Tris, Zanjabila, Sasou, Akira
Format	Conference Proceeding
Language	English
Published	IEEE 18.10.2022
Subjects	acoustic embedding affective computing age prediction Conferences country prediction Emotion recognition Feature extraction Harmonic analysis Measurement multitask learning Predictive models speech emotion recognition Speech recognition
Online Access	Get full text

Cover

Loading…

Abstract	In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech.
AbstractList	In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech.
Author	Sasou, Akira Atmaja, Bagus Tris Zanjabila
Author_xml	– sequence: 1 givenname: Bagus Tris surname: Atmaja fullname: Atmaja, Bagus Tris email: b-atmaja@aist.go.jp organization: AIST,Tsukuba,Japan – sequence: 2 surname: Zanjabila fullname: Zanjabila email: zanjabilaabil@gmail.com organization: ITS,Surabaya,Indonesia – sequence: 3 givenname: Akira surname: Sasou fullname: Sasou, Akira email: a-sasou@aist.go.jp organization: AIST,Tsukuba,Japan
BookMark	eNo1j8tOwzAURI0EC1r4Axb-gCZcP1svo6hAUCVaqRXLyrFvKkutjRJ3kb8nCFjN4swZaWbkNqaIhFAGJWNgnqu6aT7VkgtWcuC8ZAArZQy7ITOmtZJKGhD3ZPeeQsznkW579MHlEE90fUk5pLig1QkX1EZP63SNuR_pYfjhU7XY9zZE9LRy6Trk4CapRe8n_EDuOnse8PEv5-Twst7Xb8Xm47Wpq00ROMhcMO7AGWd9u_Kt9q3iTHSdsOCcEAy5AKlQcy_Bg9eKgV0a20mlbYtOdVrMydPvbkDE41cfLrYfj_83xTdyS02S
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ACIIW57231.2022.10085991
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665454903 9781665454902
EndPage	6
ExternalDocumentID	10085991
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i204t-12c0c9cadb8db6db5213ff3a0cc331e23045e62d40d0d6510a79af456abec5f63
IEDL.DBID	RIE
IngestDate	Thu Jan 18 11:14:29 EST 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-12c0c9cadb8db6db5213ff3a0cc331e23045e62d40d0d6510a79af456abec5f63
PageCount	6
ParticipantIDs	ieee_primary_10085991
PublicationCentury	2000
PublicationDate	2022-Oct.-18
PublicationDateYYYYMMDD	2022-10-18
PublicationDate_xml	– month: 10 year: 2022 text: 2022-Oct.-18 day: 18
PublicationDecade	2020
PublicationTitle	2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
PublicationTitleAbbrev	ACIIW
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.834862
Snippet	In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks:...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	acoustic embedding affective computing age prediction Conferences country prediction Emotion recognition Feature extraction Harmonic analysis Measurement multitask learning Predictive models speech emotion recognition Speech recognition
Title	Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
URI	https://ieeexplore.ieee.org/document/10085991
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA26J59UnPhNHnxcujYfXfM4xsY2cChsuLeRr8pQWhndw_z15qaroiD4VtqkLbm09yQ55x6E7pXKeC6EIwnrUcJTKommUhCmbOYhv2JMg3b4YZaOF3y6FMu9WD1oYZxzgXzmIjgMe_m2NFtYKutCIRohQat-6GdutVirYefEstsfTCbPoucRi5_3URo1zX8Yp4S8MTpGs-aJNV3kNdpWOjIfv4ox_vuVTlD7W6KHH7-Szyk6cMUZepqW66J62_krsP8CjGY8rH16Orj_4jpYFRaDDL3a7HBgC0BTMgefCGdx35TB3Mt30s7CjdtoMRrOB2Oy90wgaxrziiTUxEYaZXVmdWq1z84sz5mKjWEscbAELFxKLY9tbFP_QaqeVLlHUcoHU-QpO0etoizcBcIeymjfVQrKc54yl7HMUDCIAVjBGLtEbRiP1XtdFmPVDMXVH-ev0RGEBX78SXaDWtVm6259Rq_0XYjkJ0BPoFs
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LSsNAFB2kLnSlYsW3s3DZSZN55LEspaWtbVFosbsyr0hREinpon69c5NGURDchSQ3GeYSzs3MOfcgdC9lzFMhLAlYRAkPaUIUTQRh0sSu5JeMKdAOT6bhYM5HC7HYidVLLYy1tiSfWQ8Oy718k-sNLJW1oRGNSECrvu-AXwSVXKvm5_hJu9MdDp9F5GoW9-dHqVcH_LBOKZGjf4Sm9TsrwsirtymUpz9-tWP896COUfNbpIcfv-DnBO3Z7BQ9jfJVVrxt3RXYgQFOM-5VTj0t3HmxLSwzg0GIXqy3uOQLwK1kBk4R1uCOzkt7LxekrIEHN9G835t1B2TnmkBW1OcFCaj2daKlUbFRoVEOn1maMulrzVhgYRFY2JAa7hvfhO6TlFEiU1dHSZdOkYbsDDWyPLPnCLtiRrnQRFCe8pDZmMWagkUMFBaMsQvUhPlYvleNMZb1VFz-cf4OHQxmk_FyPJw-XKFDSBHAQBBfo0ax3tgbh--Fui2z-gnP16Ok
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+10th+International+Conference+on+Affective+Computing+and+Intelligent+Interaction+Workshops+and+Demos+%28ACIIW%29&rft.atitle=Jointly+Predicting+Emotion%2C+Age%2C+and+Country+Using+Pre-Trained+Acoustic+Embedding&rft.au=Atmaja%2C+Bagus+Tris&rft.au=Zanjabila&rft.au=Sasou%2C+Akira&rft.date=2022-10-18&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FACIIW57231.2022.10085991&rft.externalDocID=10085991