Cache based recurrent neural network language model inference for first pass speech recognition

Recurrent neural network language models (RNNLMs) have recently produced improvements on language processing tasks ranging from machine translation to word tagging and speech recognition. To date, however, the computational expense of RNNLMs has hampered their application to first pass decoding. In...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6354 - 6358
Main Authors	Zhiheng Huang, Zweig, Geoffrey, Dumoulin, Benoit
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2014
Subjects	cache computational efficiency Computational modeling Data models Decoding Hidden Markov models History recurrent neural network language model Recurrent neural networks Speech recognition voice search
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recurrent neural network language models (RNNLMs) have recently produced improvements on language processing tasks ranging from machine translation to word tagging and speech recognition. To date, however, the computational expense of RNNLMs has hampered their application to first pass decoding. In this paper, we show that by restricting the RNNLM calls to those words that receive a reasonable score according to a n-gram model, and by deploying a set of caches, we can reduce the cost of using an RNNLM in the first pass to that of using an additional n-gram model. We compare this scheme to lattice rescoring, and find that they produce comparable results for a Bing Voice search task. The best performance results from rescoring a lattice that is itself created with a RNNLM in the first pass.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2014.6854827