Mongolian Pronunciation Data Enhancement Method Based on Separation Contrast Learning

In the process of augmenting Mongolian speech data, the model overfits the noise data in the out-of-domain speech data set, causing the augmented speech data to be interfered by noise. This paper proposes a Mongolian Pronunciation Data Enhancement Method based on Separated Contrastive Learning (PEM-...

Full description

Saved in:

Bibliographic Details
Published in	2024 International Conference on Asian Language Processing (IALP) pp. 314 - 319
Main Authors	Guo, Siyuan, Ma, Zhiqiang, Sun, Jiaqi, Qin, Yixiong
Format	Conference Proceeding
Language	English
Published	IEEE 04.08.2024
Subjects	Contrastive learning data augmentation data enhancement Data models Generators Interference Mongolian Noise Speech enhancement Text-to-Speech Vocoders
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In the process of augmenting Mongolian speech data, the model overfits the noise data in the out-of-domain speech data set, causing the augmented speech data to be interfered by noise. This paper proposes a Mongolian Pronunciation Data Enhancement Method based on Separated Contrastive Learning (PEM-SC). PEM-SC consists of feature separation module and contrast enhancement module. By comparing the similarities and differences between noisy speech and clean speech, the target features are highlighted, thereby further improving the accuracy and clarity of pronunciation. After multiple sets of experiments, it has been shown that the MOS of generating pronunciation is 4.01. MCD, CBAK, PESQ, and ESTOI are 19.73, 2.88, 3.851, and 0.883 respectively.
ISSN:	2159-1970
DOI:	10.1109/IALP63756.2024.10661109