Repackagingaugment: Overcoming Prediction Error Amplification in Weight-Averaged Speech Recognition Models Subject to Self-Training

Representation-based speech recognition models have demonstrated state-of-the-art performance on downstream tasks. These models are pre-trained on large-scale unlabeled data, fine-tuned on a small amount of labeled data, and subsequently advanced via the self-training procedure by leveraging pseudo-...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors	Lee, Jae-Hong, Kim, Dong-Hyun, Chang, Joon-Hyuk
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	Data augmentation Data models Degradation Predictive models Representation-based speech recognition model Self-training Semi-supervised learning Semisupervised learning Signal processing Speech recognition Training Weight-averaging
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Representation-based speech recognition models have demonstrated state-of-the-art performance on downstream tasks. These models are pre-trained on large-scale unlabeled data, fine-tuned on a small amount of labeled data, and subsequently advanced via the self-training procedure by leveraging pseudo-labels. However, a self-trained representation model produces prediction errors caused by training with incorrect labels in the pseudo-labeled data. Weight-averaging methods have been employed to refine the pseudo-labels in a variety of studies; however, these methods amplify the prediction errors of each self-trained model. To alleviate this problem, we propose RepackagingAugment, a data augmentation method that improves the diversity of models while preventing the same incorrect labels from recursively occurring in every epoch. Our data augmentation deconstructs the paired speech-text data into word units and repackages them into a randomly determined number of word sequences. This strategy induces the models to produce different prediction errors by mitigating the problem of incorrect label over-fitting. Through various experiments on representation models, such as wav2vec 2.0 and data2vec, we demonstrate that our approach improves the performance of weight-averaged models.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10096146