Two-pass end to end speech recognition

Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an...

Full description

Saved in:

Bibliographic Details
Main Authors	SAINATH, Tara C, HE, Yanzhang, LIANG, Qiao, PANG, Ruoming, STROHMAN, Trevor, PRABHAVALKAR, Rohit, RYBACH, David, LI, Wei, VISONTAI, Mirkó, MCGRAW, Ian C, WU, Yonghui, CHIU, Chung-Cheng
Format	Patent
Language	English
Published	16.02.2023
Subjects	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

Abstract	Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
AbstractList	Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
Author	CHIU, Chung-Cheng PRABHAVALKAR, Rohit PANG, Ruoming VISONTAI, Mirkó RYBACH, David LIANG, Qiao WU, Yonghui SAINATH, Tara C HE, Yanzhang LI, Wei STROHMAN, Trevor MCGRAW, Ian C
Author_xml	– fullname: SAINATH, Tara C – fullname: HE, Yanzhang – fullname: LIANG, Qiao – fullname: PANG, Ruoming – fullname: STROHMAN, Trevor – fullname: PRABHAVALKAR, Rohit – fullname: RYBACH, David – fullname: LI, Wei – fullname: VISONTAI, Mirkó – fullname: MCGRAW, Ian C – fullname: WU, Yonghui – fullname: CHIU, Chung-Cheng
BookMark	eNrjYmDJy89L5WRQCynP1y1ILC5WSM1LUSjJB1PFBampyRkKRanJ-el5mSWZ-Xk8DKxpiTnFqbxQmptBxc01xNlDN7UgPz61uCAxOTUvtSTeMdTIAAgtLEzNTJ2cjIyJVAYAILQqWQ
ContentType	Patent
DBID	EVB
DatabaseName	esp@cenet
DatabaseTitleList
Database_xml	– sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Chemistry Sciences Physics
ExternalDocumentID	AU2020288565BB2
GroupedDBID	EVB
ID	FETCH-epo_espacenet_AU2020288565BB23
IEDL.DBID	EVB
IngestDate	Fri Oct 25 05:38:03 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-epo_espacenet_AU2020288565BB23
Notes	Application Number: AU20200288565
OpenAccessLink	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230216&DB=EPODOC&CC=AU&NR=2020288565B2
ParticipantIDs	epo_espacenet_AU2020288565BB2
PublicationCentury	2000
PublicationDate	20230216
PublicationDateYYYYMMDD	2023-02-16
PublicationDate_xml	– month: 02 year: 2023 text: 20230216 day: 16
PublicationDecade	2020
PublicationYear	2023
RelatedCompanies	GOOGLE LLC
RelatedCompanies_xml	– name: GOOGLE LLC
Score	3.4559872
Snippet	Two-pass automatic speech recognition (ASR) models can be used to perform streaming on- device ASR to generate a text representation of an utterance captured...
SourceID	epo
SourceType	Open Access Repository
SubjectTerms	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Title	Two-pass end to end speech recognition
URI	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20230216&DB=EPODOC&locale=&CC=AU&NR=2020288565B2
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1La8MwDBale962bGOvjhxKbmZ5usmhjOVFGfTBSEdvpbFdNhhOWDL696eYtOtpOxlkkG2BpM_2Zwugjxi4uQxjJPBMTlzqcxIE9ooIM7c9f2U5TJUDGk_oaO6-LLxFBz63b2HUP6Eb9TkiehRDf69VvC5_D7Fixa2sHvMPFBVPaTaMjXZ3jHjatqgRh8NkNo2nkRFFuJM0Jq9NH2ZSH-FLiAH7AIH0oCGAJW9h8y6l3E8q6RkczlCfrM-hI6QGJ9G29poGx-P2yluDI8XRZBUKWz-sLsDINgUpEffqQnK9LlRTlUKwd31HCSrkJfTTJItGBIde7la6fJ7vzTO0nSvoykKKa9Ad4buMCYuuXcel1F7xATMpYk6Hrynl-Q30_lR1-0__HZw25mv4yBa9h2799S16mG7r_EFZ6Qe5eoDz
link.rule.ids	230,309,783,888,25578,76884
linkProvider	European Patent Office
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT4MwEL8s82O-KWrmx5SHhTcinxUeFiOwBXWwxTCzNwKliyYGiGD273s0bO5Jn5q0ybW95O5-bX-9AxgiBm4ew6hsm0omG8TKZNvWEpkpqWZaiapTXg4oCIm_MJ6X5rIDn5u_MDxP6JonR0SLomjvNffX5e8llse5ldVd-oFdxcMkGnlSezpGPK2pRPKc0Xg-82au5Lp4kpTC12YMI6mF8MVBh72HINtqMu2P35zmX0q5G1Qmx7A_R3l5fQIdlgvQcze11wQ4DNonbwEOOEeTVtjZ2mF1ClK0LuQSca_I8kysC95UJWP0XdxSgor8DIaTceT6Mk4db3caPy521ulo-jl08yJnfRB1ZhmUMpWsDN0gREuye6oQxJx6tiIkSy9g8Keoy3_Gb6HnR8E0nj6FL1dw1Kiy4Sar5Bq69dc3G2DordMbrrEf7XKD4w
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=Two-pass+end+to+end+speech+recognition&rft.inventor=SAINATH%2C+Tara+C&rft.inventor=HE%2C+Yanzhang&rft.inventor=LIANG%2C+Qiao&rft.inventor=PANG%2C+Ruoming&rft.inventor=STROHMAN%2C+Trevor&rft.inventor=PRABHAVALKAR%2C+Rohit&rft.inventor=RYBACH%2C+David&rft.inventor=LI%2C+Wei&rft.inventor=VISONTAI%2C+Mirk%C3%B3&rft.inventor=MCGRAW%2C+Ian+C&rft.inventor=WU%2C+Yonghui&rft.inventor=CHIU%2C+Chung-Cheng&rft.date=2023-02-16&rft.externalDBID=B2&rft.externalDocID=AU2020288565BB2