Mongolian Pronunciation Data Enhancement Method Based on Separation Contrast Learning

In the process of augmenting Mongolian speech data, the model overfits the noise data in the out-of-domain speech data set, causing the augmented speech data to be interfered by noise. This paper proposes a Mongolian Pronunciation Data Enhancement Method based on Separated Contrastive Learning (PEM-...

Full description

Saved in:
Bibliographic Details
Published in2024 International Conference on Asian Language Processing (IALP) pp. 314 - 319
Main Authors Guo, Siyuan, Ma, Zhiqiang, Sun, Jiaqi, Qin, Yixiong
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the process of augmenting Mongolian speech data, the model overfits the noise data in the out-of-domain speech data set, causing the augmented speech data to be interfered by noise. This paper proposes a Mongolian Pronunciation Data Enhancement Method based on Separated Contrastive Learning (PEM-SC). PEM-SC consists of feature separation module and contrast enhancement module. By comparing the similarities and differences between noisy speech and clean speech, the target features are highlighted, thereby further improving the accuracy and clarity of pronunciation. After multiple sets of experiments, it has been shown that the MOS of generating pronunciation is 4.01. MCD, CBAK, PESQ, and ESTOI are 19.73, 2.88, 3.851, and 0.883 respectively.
ISSN:2159-1970
DOI:10.1109/IALP63756.2024.10661109