Two-pass end to end speech recognition

Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an...

Full description

Saved in:
Bibliographic Details
Main Authors SAINATH, Tara C, HE, Yanzhang, LIANG, Qiao, PANG, Ruoming, STROHMAN, Trevor, PRABHAVALKAR, Rohit, RYBACH, David, LI, Wei, VISONTAI, Mirkó, MCGRAW, Ian C, WU, Yonghui, CHIU, Chung-Cheng
Format Patent
LanguageEnglish
Published 16.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
AbstractList Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
Author CHIU, Chung-Cheng
PRABHAVALKAR, Rohit
PANG, Ruoming
VISONTAI, Mirkó
RYBACH, David
LIANG, Qiao
WU, Yonghui
SAINATH, Tara C
HE, Yanzhang
LI, Wei
STROHMAN, Trevor
MCGRAW, Ian C
Author_xml – fullname: SAINATH, Tara C
– fullname: HE, Yanzhang
– fullname: LIANG, Qiao
– fullname: PANG, Ruoming
– fullname: STROHMAN, Trevor
– fullname: PRABHAVALKAR, Rohit
– fullname: RYBACH, David
– fullname: LI, Wei
– fullname: VISONTAI, Mirkó
– fullname: MCGRAW, Ian C
– fullname: WU, Yonghui
– fullname: CHIU, Chung-Cheng
BookMark eNrjYmDJy89L5WRQCynP1y1ILC5WSM1LUSjJB1PFBampyRkKRanJ-el5mSWZ-Xk8DKxpiTnFqbxQmptBxc01xNlDN7UgPz61uCAxOTUvtSTeMdTIAAgtLEzNTJ2cjIyJVAYAILQqWQ
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
ExternalDocumentID AU2020288565BB2
GroupedDBID EVB
ID FETCH-epo_espacenet_AU2020288565BB23
IEDL.DBID EVB
IngestDate Fri Oct 25 05:38:03 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_AU2020288565BB23
Notes Application Number: AU20200288565
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230216&DB=EPODOC&CC=AU&NR=2020288565B2
ParticipantIDs epo_espacenet_AU2020288565BB2
PublicationCentury 2000
PublicationDate 20230216
PublicationDateYYYYMMDD 2023-02-16
PublicationDate_xml – month: 02
  year: 2023
  text: 20230216
  day: 16
PublicationDecade 2020
PublicationYear 2023
RelatedCompanies GOOGLE LLC
RelatedCompanies_xml – name: GOOGLE LLC
Score 3.4559872
Snippet Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured...
SourceID epo
SourceType Open Access Repository
SubjectTerms ACOUSTICS
CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
Title Two-pass end to end speech recognition
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230216&DB=EPODOC&locale=&CC=AU&NR=2020288565B2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1La8MwDBale962bGOvjhxKbmZ5usmhjOVFGfTBSEdvpbFdNhhOWDL696eYtOtpOxlkkG2BpM_2Zwugjxi4uQxjJPBMTlzqcxIE9ooIM7c9f2U5TJUDGk_oaO6-LLxFBz63b2HUP6Eb9TkiehRDf69VvC5_D7Fixa2sHvMPFBVPaTaMjXZ3jHjatqgRh8NkNo2nkRFFuJM0Jq9NH2ZSH-FLiAH7AIH0oCGAJW9h8y6l3E8q6RkczlCfrM-hI6QGJ9G29poGx-P2yluDI8XRZBUKWz-sLsDINgUpEffqQnK9LlRTlUKwd31HCSrkJfTTJItGBIde7la6fJ7vzTO0nSvoykKKa9Ad4buMCYuuXcel1F7xATMpYk6Hrynl-Q30_lR1-0__HZw25mv4yBa9h2799S16mG7r_EFZ6Qe5eoDz
link.rule.ids 230,309,783,888,25578,76884
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT4MwEL8s82O-KWrmx5SHhTcinxUeFiOwBXWwxTCzNwKliyYGiGD273s0bO5Jn5q0ybW95O5-bX-9AxgiBm4ew6hsm0omG8TKZNvWEpkpqWZaiapTXg4oCIm_MJ6X5rIDn5u_MDxP6JonR0SLomjvNffX5e8llse5ldVd-oFdxcMkGnlSezpGPK2pRPKc0Xg-82au5Lp4kpTC12YMI6mF8MVBh72HINtqMu2P35zmX0q5G1Qmx7A_R3l5fQIdlgvQcze11wQ4DNonbwEOOEeTVtjZ2mF1ClK0LuQSca_I8kysC95UJWP0XdxSgor8DIaTceT6Mk4db3caPy521ulo-jl08yJnfRB1ZhmUMpWsDN0gREuye6oQxJx6tiIkSy9g8Keoy3_Gb6HnR8E0nj6FL1dw1Kiy4Sar5Bq69dc3G2DordMbrrEf7XKD4w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=Two-pass+end+to+end+speech+recognition&rft.inventor=SAINATH%2C+Tara+C&rft.inventor=HE%2C+Yanzhang&rft.inventor=LIANG%2C+Qiao&rft.inventor=PANG%2C+Ruoming&rft.inventor=STROHMAN%2C+Trevor&rft.inventor=PRABHAVALKAR%2C+Rohit&rft.inventor=RYBACH%2C+David&rft.inventor=LI%2C+Wei&rft.inventor=VISONTAI%2C+Mirk%C3%B3&rft.inventor=MCGRAW%2C+Ian+C&rft.inventor=WU%2C+Yonghui&rft.inventor=CHIU%2C+Chung-Cheng&rft.date=2023-02-16&rft.externalDBID=B2&rft.externalDocID=AU2020288565BB2