Mongolian Pronunciation Data Enhancement Method Based on Separation Contrast Learning
In the process of augmenting Mongolian speech data, the model overfits the noise data in the out-of-domain speech data set, causing the augmented speech data to be interfered by noise. This paper proposes a Mongolian Pronunciation Data Enhancement Method based on Separated Contrastive Learning (PEM-...
Saved in:
Published in | 2024 International Conference on Asian Language Processing (IALP) pp. 314 - 319 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
04.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In the process of augmenting Mongolian speech data, the model overfits the noise data in the out-of-domain speech data set, causing the augmented speech data to be interfered by noise. This paper proposes a Mongolian Pronunciation Data Enhancement Method based on Separated Contrastive Learning (PEM-SC). PEM-SC consists of feature separation module and contrast enhancement module. By comparing the similarities and differences between noisy speech and clean speech, the target features are highlighted, thereby further improving the accuracy and clarity of pronunciation. After multiple sets of experiments, it has been shown that the MOS of generating pronunciation is 4.01. MCD, CBAK, PESQ, and ESTOI are 19.73, 2.88, 3.851, and 0.883 respectively. |
---|---|
ISSN: | 2159-1970 |
DOI: | 10.1109/IALP63756.2024.10661109 |