Online Automatic Speech Recognition With Listen, Attend and Spell Model

The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this letter, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of sile...

Full description

Saved in:

Bibliographic Details
Published in	IEEE signal processing letters Vol. 27; pp. 1889 - 1893
Main Authors	Hsiao, Roger, Can, Dogan, Ng, Tim, Travadi, Ruchir, Ghoshal, Arnab
Format	Journal Article
Language	English
Published	New York IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics Attention Automatic speech recognition Computational modeling Decoding Earth Observing System End-to-end ASR Error analysis Hidden Markov models Mandarin Markov analysis Markov chains Model accuracy Neural networks online recognition Speech recognition Training Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this letter, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2020.3031480