An Empirical Study of Transformer-Based Neural Language Model Adaptation

We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from scratch and then adapt it to relatively small...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7934 - 7938
Main Authors	Li, Ke, Liu, Zhe, He, Tianxing, Huang, Hongzhao, Peng, Fuchun, Povey, Daniel, Khudanpur, Sanjeev
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2020
Subjects	Adaptation models automatic speech recognition Data models Interpolation language model adaptation linear interpolation Merging Mixers neural language model Speech processing Switches Transformer
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We explore two adaptation approaches of deep Transformer based neural language models (LMs) for automatic speech recognition. The first approach is a pretrain-finetune framework, where we first pretrain a Transformer LM on a large-scale text corpus from scratch and then adapt it to relatively small target domains via finetuning. The second approach is a mixer of dynamically weighted models that are separately trained on source and target domains, aiming to improve simple linear interpolation with dynamic weighting. We compare the two approaches with three baselines - without adaptation, merging data, and simple interpolation - on Switchboard (SWBD) and Wall Street Journal (WSJ). Experiments show that the mixer model generally performs better than baselines and finetuning. Compared with no adaptation, finetuning and the mixer approach obtain up to relative 11.5% and 14.1% WER reductions on SWBD, respectively. The mixer model also outperforms linear interpolation and merging data. On WSJ, the mixer approach achieves a new state-of-the-art WER result.
ISSN:	2379-190X
DOI:	10.1109/ICASSP40776.2020.9053399