Using fine-tuning and min lookahead beam search to improve Whisper
The performance of Whisper in low-resource languages is still far from perfect. In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper. To address these issues, we fine-tune Whisper on additional data and propose an...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.09.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The performance of Whisper in low-resource languages is still far from
perfect. In addition to a lack of training data on low-resource languages, we
identify some limitations in the beam search algorithm used in Whisper. To
address these issues, we fine-tune Whisper on additional data and propose an
improved decoding algorithm. On the Vietnamese language, fine-tuning
Whisper-Tiny with LoRA leads to an improvement of 38.49 in WER over the
zero-shot Whisper-Tiny setting which is a further reduction of 1.45 compared to
full-parameter fine-tuning. Additionally, by using Filter-Ends and Min
Lookahead decoding algorithms, the WER reduces by 2.26 on average over a range
of languages compared to standard beam search. These results generalise to
larger Whisper model sizes. We also prove a theorem that Min Lookahead
outperforms the standard beam search algorithm used in Whisper. |
---|---|
DOI: | 10.48550/arxiv.2309.10299 |