Cross-Lingual Transfer for Russian Speech Emotion Automatic Recognition: Data and Trends
A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source...
Saved in:
Published in | Automatic documentation and mathematical linguistics Vol. 59; no. 3; pp. 166 - 176 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Moscow
Pleiades Publishing
01.06.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A study of the influence of differences in languages and training data on the quality of cross-lingual transfer of a trained speech model to Russian in the task of automatic recognition of emotions in speech is described. At the training stage, English, Polish, Chinese, and Japanese served as source languages, for which the IEMOCAP, nEMO, ESD, and JVNV emotional speech datasets were used, respectively, and the model itself was the HuBERT speech model on the transformer architecture. All models trained on the corresponding dataset were tested on a shortened sample from the Dusha Russian emotional speech dataset. Based on the data obtained, the main trends in choosing different languages for training the speech model and its subsequent transfer to Russian are considered, and differences in datasets are analyzed, which indicate the need for further work on collecting and labeling quality emotional speech data. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0005-1055 1934-8371 |
DOI: | 10.3103/S000510552570058X |