LiP-Flow: Learning Inference-Time Priors for Codec Avatars via Normalizing Flows in Latent Space

Neural face avatars that are trained from multi-view data captured in camera domes can produce photo-realistic 3D reconstructions. However, at inference time, they must be driven by limited inputs such as partial views recorded by headset-mounted cameras or a front-facing camera, and sparse facial l...

Full description

Saved in:

Bibliographic Details
Published in	Computer Vision - ECCV 2022 Vol. 13686; pp. 92 - 110
Main Authors	Aksan, Emre, Ma, Shugao, Caliskan, Akin, Pidhorskyi, Stanislav, Richard, Alexander, Wei, Shih-En, Saragih, Jason, Hilliges, Otmar
Format	Book Chapter
Language	English
Published	Switzerland Springer 2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Neural face avatars that are trained from multi-view data captured in camera domes can produce photo-realistic 3D reconstructions. However, at inference time, they must be driven by limited inputs such as partial views recorded by headset-mounted cameras or a front-facing camera, and sparse facial landmarks. To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space. Our proposed model, LiP-Flow, consists of two encoders that learn representations from the rich training-time and impoverished inference-time observations. A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective. We trained our model end-to-end to maximize the similarity of both representation spaces and the reconstruction quality, making the 3D face model aware of the limited driving signals. We conduct extensive evaluations where the latent codes are optimized to reconstruct 3D avatars from partial or sparse observations. We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better. Check out our project page for an overview.
Bibliography:	Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-19809-0_6. E. Aksan and A. Caliskan—This work was performed during an internship at Meta Reality Labs Research.
ISBN:	9783031198083 3031198085
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-19809-0_6