Photorealistic talking faces from audio

Provided is a framework for generating photorealistic 3D talking faces conditioned only on audio input. In addition, the present disclosure provides associated methods to insert generated faces into existing videos or virtual environments. We decompose faces from video into a normalized space that d...

Full description

Saved in:

Bibliographic Details
Main Authors	Kwatra, Vivek, Lewis, John, Frueh, Christian, Lahiri, Avisek
Format	Patent
Language	English
Published	09.07.2024
Subjects	CALCULATING COMPUTING COUNTING IMAGE DATA PROCESSING OR GENERATION, IN GENERAL PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Provided is a framework for generating photorealistic 3D talking faces conditioned only on audio input. In addition, the present disclosure provides associated methods to insert generated faces into existing videos or virtual environments. We decompose faces from video into a normalized space that decouples 3D geometry, head pose, and texture. This allows separating the prediction problem into regressions over the 3D face shape and the corresponding 2D texture atlas. To stabilize temporal dynamics, we propose an auto-regressive approach that conditions the model on its previous visual state. We also capture face illumination in our model using audio-independent 3D texture normalization.
Bibliography:	Application Number: US202117796399