EXPORTING MODULAR ENCODER FEATURES FOR STREAMING AND DELIBERATION ASR

A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by...

Full description

Saved in:
Bibliographic Details
Main Authors Botros, Rami Magdi Fahmi, Sainath, Tara N, Schalkwyk, Johan, Beaufays, Francoise, Prabhavalkar, Rohit Prakash, Chelba, Ciprian Ioan
Format Patent
LanguageEnglish
Published 02.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription. The method also includes training the exporter network based on the exporter decoder losses while parameters of the base encoder are frozen.
Bibliography:Application Number: US202318494763