Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy

Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with id...

Full description

Saved in:

Bibliographic Details
Main Authors	Barra-Chicote, Roberto, Breen, Andrew Paul, Sawaf, Hassan, Enyedi, Robert, Krishnaswamy, Arvindh, Isik, Mehmet Umut, Giri, Ritwik, Federico, Marcello, Al-Onaizan, Yaser
Format	Patent
Language	English
Published	03.01.2023
Subjects	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING INFORMATION STORAGE INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORDCARRIER AND TRANSDUCER MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.
Bibliography:	Application Number: US201916709792