Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy

Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with id...

Full description

Saved in:
Bibliographic Details
Main Authors Barra-Chicote, Roberto, Breen, Andrew Paul, Sawaf, Hassan, Enyedi, Robert, Krishnaswamy, Arvindh, Isik, Mehmet Umut, Giri, Ritwik, Federico, Marcello, Al-Onaizan, Yaser
Format Patent
LanguageEnglish
Published 03.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.
Bibliography:Application Number: US201916709792