Two-pass end to end speech recognition

Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an...

Full description

Saved in:

Bibliographic Details
Main Authors	SAINATH, Tara C, HE, Yanzhang, LIANG, Qiao, PANG, Ruoming, STROHMAN, Trevor, PRABHAVALKAR, Rohit, RYBACH, David, LI, Wei, VISONTAI, Mirkó, MCGRAW, Ian C, WU, Yonghui, CHIU, Chung-Cheng
Format	Patent
Language	English
Published	16.02.2023
Subjects	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
Bibliography:	Application Number: AU20200288565