Speech training data set enhancement method and device, equipment and storage medium

The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporar...

Full description

Saved in:

Bibliographic Details
Main Authors	WANG JIANZONG, TANG YANXI, QU XIAOYANG
Format	Patent
Language	Chinese English
Published	10.08.2021
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporary Mel spectrogram after the pixel point rearrangement processing, and setting an erasure region area, setting a shape parameter, a change parameter or a random erasure coefficient of an erasure region for each temporary Mel frequency spectrum, obtaining a plurality of extended Mel frequency spectrograms, and converting each extended Mel frequency spectrogram into corresponding target voice training data, thereby completing the supplement of the voice training data. The method is advantaged in that a problem that the voice model is prone to overfitting in the training process due to less voice training data is solved, robustness of the voice model is improved, the voice model is prevented from being caught in overfi
Bibliography:	Application Number: CN202110610940