Token-Weighted RNN-T for Learning from Flawed Data

ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Gil, Keren, Zhou, Wei, Kalinli, Ozlem
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 26.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose a novel token-weighted RNN-T criterion that augments the RNN-T objective with token-specific weights. The new objective is used for mitigating accuracy loss from transcriptions errors in the training data, which naturally appear in two settings: pseudo-labeling and human annotation errors. Experiments results show that using our method for semi-supervised learning with pseudo-labels leads to a consistent accuracy improvement, up to 38% relative. We also analyze the accuracy degradation resulting from different levels of WER in the reference transcription, and show that token-weighted RNN-T is suitable for overcoming this degradation, recovering 64%-99% of the accuracy loss.
ISSN:2331-8422