VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment
We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions. Our method is real-time and produces high-fidelity and view-consistent output, suitable for 3D teleconferencing systems based on holograph...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a 3D-aware one-shot head reenactment method based on a fully
volumetric neural disentanglement framework for source appearance and driver
expressions. Our method is real-time and produces high-fidelity and
view-consistent output, suitable for 3D teleconferencing systems based on
holographic displays. Existing cutting-edge 3D-aware reenactment methods often
use neural radiance fields or 3D meshes to produce view-consistent appearance
encoding, but, at the same time, they rely on linear face models, such as 3DMM,
to achieve its disentanglement with facial expressions. As a result, their
reenactment results often exhibit identity leakage from the driver or have
unnatural expressions. To address these problems, we propose a neural
self-supervised disentanglement approach that lifts both the source image and
driver video frame into a shared 3D volumetric representation based on
tri-planes. This representation can then be freely manipulated with expression
tri-planes extracted from the driving images and rendered from an arbitrary
view using neural radiance fields. We achieve this disentanglement via
self-supervised learning on a large in-the-wild video dataset. We further
introduce a highly effective fine-tuning approach to improve the
generalizability of the 3D lifting using the same real-world data. We
demonstrate state-of-the-art performance on a wide range of datasets, and also
showcase high-quality 3D-aware head reenactment on highly challenging and
diverse subjects, including non-frontal head poses and complex expressions for
both source and driver. |
---|---|
DOI: | 10.48550/arxiv.2312.04651 |