Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emo...

Full description

Saved in:
Bibliographic Details
Published in2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 6
Main Authors Atmaja, Bagus Tris, Zanjabila, Sasou, Akira
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech.
AbstractList In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech.
Author Sasou, Akira
Atmaja, Bagus Tris
Zanjabila
Author_xml – sequence: 1
  givenname: Bagus Tris
  surname: Atmaja
  fullname: Atmaja, Bagus Tris
  email: b-atmaja@aist.go.jp
  organization: AIST,Tsukuba,Japan
– sequence: 2
  surname: Zanjabila
  fullname: Zanjabila
  email: zanjabilaabil@gmail.com
  organization: ITS,Surabaya,Indonesia
– sequence: 3
  givenname: Akira
  surname: Sasou
  fullname: Sasou, Akira
  email: a-sasou@aist.go.jp
  organization: AIST,Tsukuba,Japan
BookMark eNo1j8tOwzAURI0EC1r4Axb-gCZcP1svo6hAUCVaqRXLyrFvKkutjRJ3kb8nCFjN4swZaWbkNqaIhFAGJWNgnqu6aT7VkgtWcuC8ZAArZQy7ITOmtZJKGhD3ZPeeQsznkW579MHlEE90fUk5pLig1QkX1EZP63SNuR_pYfjhU7XY9zZE9LRy6Trk4CapRe8n_EDuOnse8PEv5-Twst7Xb8Xm47Wpq00ROMhcMO7AGWd9u_Kt9q3iTHSdsOCcEAy5AKlQcy_Bg9eKgV0a20mlbYtOdVrMydPvbkDE41cfLrYfj_83xTdyS02S
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ACIIW57231.2022.10085991
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665454903
9781665454902
EndPage 6
ExternalDocumentID 10085991
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i204t-12c0c9cadb8db6db5213ff3a0cc331e23045e62d40d0d6510a79af456abec5f63
IEDL.DBID RIE
IngestDate Thu Jan 18 11:14:29 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-12c0c9cadb8db6db5213ff3a0cc331e23045e62d40d0d6510a79af456abec5f63
PageCount 6
ParticipantIDs ieee_primary_10085991
PublicationCentury 2000
PublicationDate 2022-Oct.-18
PublicationDateYYYYMMDD 2022-10-18
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-Oct.-18
  day: 18
PublicationDecade 2020
PublicationTitle 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)
PublicationTitleAbbrev ACIIW
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.834862
Snippet In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks:...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms acoustic embedding
affective computing
age prediction
Conferences
country prediction
Emotion recognition
Feature extraction
Harmonic analysis
Measurement
multitask learning
Predictive models
speech emotion recognition
Speech recognition
Title Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
URI https://ieeexplore.ieee.org/document/10085991
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA26J59UnPhNHnxcujYfXfM4xsY2cChsuLeRr8pQWhndw_z15qaroiD4VtqkLbm09yQ55x6E7pXKeC6EIwnrUcJTKommUhCmbOYhv2JMg3b4YZaOF3y6FMu9WD1oYZxzgXzmIjgMe_m2NFtYKutCIRohQat-6GdutVirYefEstsfTCbPoucRi5_3URo1zX8Yp4S8MTpGs-aJNV3kNdpWOjIfv4ox_vuVTlD7W6KHH7-Szyk6cMUZepqW66J62_krsP8CjGY8rH16Orj_4jpYFRaDDL3a7HBgC0BTMgefCGdx35TB3Mt30s7CjdtoMRrOB2Oy90wgaxrziiTUxEYaZXVmdWq1z84sz5mKjWEscbAELFxKLY9tbFP_QaqeVLlHUcoHU-QpO0etoizcBcIeymjfVQrKc54yl7HMUDCIAVjBGLtEbRiP1XtdFmPVDMXVH-ev0RGEBX78SXaDWtVm6259Rq_0XYjkJ0BPoFs
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LSsNAFB2kLnSlYsW3s3DZSZN55LEspaWtbVFosbsyr0hREinpon69c5NGURDchSQ3GeYSzs3MOfcgdC9lzFMhLAlYRAkPaUIUTQRh0sSu5JeMKdAOT6bhYM5HC7HYidVLLYy1tiSfWQ8Oy718k-sNLJW1oRGNSECrvu-AXwSVXKvm5_hJu9MdDp9F5GoW9-dHqVcH_LBOKZGjf4Sm9TsrwsirtymUpz9-tWP896COUfNbpIcfv-DnBO3Z7BQ9jfJVVrxt3RXYgQFOM-5VTj0t3HmxLSwzg0GIXqy3uOQLwK1kBk4R1uCOzkt7LxekrIEHN9G835t1B2TnmkBW1OcFCaj2daKlUbFRoVEOn1maMulrzVhgYRFY2JAa7hvfhO6TlFEiU1dHSZdOkYbsDDWyPLPnCLtiRrnQRFCe8pDZmMWagkUMFBaMsQvUhPlYvleNMZb1VFz-cf4OHQxmk_FyPJw-XKFDSBHAQBBfo0ax3tgbh--Fui2z-gnP16Ok
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+10th+International+Conference+on+Affective+Computing+and+Intelligent+Interaction+Workshops+and+Demos+%28ACIIW%29&rft.atitle=Jointly+Predicting+Emotion%2C+Age%2C+and+Country+Using+Pre-Trained+Acoustic+Embedding&rft.au=Atmaja%2C+Bagus+Tris&rft.au=Zanjabila&rft.au=Sasou%2C+Akira&rft.date=2022-10-18&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FACIIW57231.2022.10085991&rft.externalDocID=10085991