Two-pass end to end speech recognition

Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an...

Full description

Saved in:
Bibliographic Details
Main Authors SAINATH, Tara C, HE, Yanzhang, LIANG, Qiao, PANG, Ruoming, STROHMAN, Trevor, PRABHAVALKAR, Rohit, RYBACH, David, LI, Wei, VISONTAI, Mirkó, MCGRAW, Ian C, WU, Yonghui, CHIU, Chung-Cheng
Format Patent
LanguageEnglish
Published 16.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
Bibliography:Application Number: AU20200288565