Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emo...
Saved in:
Published in | 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 6 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
18.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech. |
---|---|
DOI: | 10.1109/ACIIW57231.2022.10085991 |