Repackagingaugment: Overcoming Prediction Error Amplification in Weight-Averaged Speech Recognition Models Subject to Self-Training

Representation-based speech recognition models have demonstrated state-of-the-art performance on downstream tasks. These models are pre-trained on large-scale unlabeled data, fine-tuned on a small amount of labeled data, and subsequently advanced via the self-training procedure by leveraging pseudo-...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors Lee, Jae-Hong, Kim, Dong-Hyun, Chang, Joon-Hyuk
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Representation-based speech recognition models have demonstrated state-of-the-art performance on downstream tasks. These models are pre-trained on large-scale unlabeled data, fine-tuned on a small amount of labeled data, and subsequently advanced via the self-training procedure by leveraging pseudo-labels. However, a self-trained representation model produces prediction errors caused by training with incorrect labels in the pseudo-labeled data. Weight-averaging methods have been employed to refine the pseudo-labels in a variety of studies; however, these methods amplify the prediction errors of each self-trained model. To alleviate this problem, we propose RepackagingAugment, a data augmentation method that improves the diversity of models while preventing the same incorrect labels from recursively occurring in every epoch. Our data augmentation deconstructs the paired speech-text data into word units and repackages them into a randomly determined number of word sequences. This strategy induces the models to produce different prediction errors by mitigating the problem of incorrect label over-fitting. Through various experiments on representation models, such as wav2vec 2.0 and data2vec, we demonstrate that our approach improves the performance of weight-averaged models.
ISSN:2379-190X
DOI:10.1109/ICASSP49357.2023.10096146