Using fine-tuning and min lookahead beam search to improve Whisper

The performance of Whisper in low-resource languages is still far from perfect. In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper. To address these issues, we fine-tune Whisper on additional data and propose an...

Full description

Saved in:
Bibliographic Details
Main Authors Do, Andrea, Brown, Oscar, Wang, Zhengjie, Mathew, Nikhil, Liu, Zixin, Ahmed, Jawwad, Yu, Cheng
Format Journal Article
LanguageEnglish
Published 19.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The performance of Whisper in low-resource languages is still far from perfect. In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper. To address these issues, we fine-tune Whisper on additional data and propose an improved decoding algorithm. On the Vietnamese language, fine-tuning Whisper-Tiny with LoRA leads to an improvement of 38.49 in WER over the zero-shot Whisper-Tiny setting which is a further reduction of 1.45 compared to full-parameter fine-tuning. Additionally, by using Filter-Ends and Min Lookahead decoding algorithms, the WER reduces by 2.26 on average over a range of languages compared to standard beam search. These results generalise to larger Whisper model sizes. We also prove a theorem that Min Lookahead outperforms the standard beam search algorithm used in Whisper.
DOI:10.48550/arxiv.2309.10299