Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforceme...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
27.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared with Experience Replay, without compromising the AWER of new languages. |
---|---|
ISSN: | 2331-8422 |
DOI: | 10.48550/arxiv.2407.03645 |