Efficient Knowledge Distillation for RNN-Transducer Models

Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transduce...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5639 - 5643
Main Authors	Panchapagesan, Sankaran, Park, Daniel S., Chiu, Chung-Cheng, Shangguan, Yuan, Liang, Qiao, Gruenstein, Alexander
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2021
Subjects	Acoustics Conferences Knowledge Distillation Lattices Neural networks Noise measurement RNN Transducer Signal processing Speech recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transducer (RNN-T) models, a popular end-to-end neural network architecture for streaming speech recognition. Our proposed distillation loss is simple and efficient, and uses only the "y" and "blank" posterior probabilities from the RNN-T output probability lattice. We study the effectiveness of the proposed approach in improving the accuracy of sparse RNN-T models obtained by gradually pruning a larger uncompressed model, which also serves as the teacher during distillation. With distillation of 60% and 90% sparse multi-domain RNN-T models, we obtain WER reductions of 4.3% and 12.1% respectively, on a noisy FarField eval set. We also present results of experiments on LibriSpeech, where the introduction of the distillation loss yields a 4.8% relative WER reduction on the test-other dataset for a small Conformer model.
ISSN:	2379-190X
DOI:	10.1109/ICASSP39728.2021.9413905