Two-pass end to end speech recognition
Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an...
Saved in:
Main Authors | , , , , , , , , , , , |
---|---|
Format | Patent |
Language | English |
Published |
16.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder. |
---|---|
Bibliography: | Application Number: AU20200288565 |