Real-Time Radiance Fields for Single-Image Portrait View Synthesis
We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view syn...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.05.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a one-shot method to infer and render a photorealistic 3D
representation from a single unposed image (e.g., face portrait) in real-time.
Given a single RGB input, our image encoder directly predicts a canonical
triplane representation of a neural radiance field for 3D-aware novel view
synthesis via volume rendering. Our method is fast (24 fps) on consumer
hardware, and produces higher quality results than strong GAN-inversion
baselines that require test-time optimization. To train our triplane encoder
pipeline, we use only synthetic data, showing how to distill the knowledge from
a pretrained 3D GAN into a feedforward encoder. Technical contributions include
a Vision Transformer-based triplane encoder, a camera data augmentation
strategy, and a well-designed loss function for synthetic data training. We
benchmark against the state-of-the-art methods, demonstrating significant
improvements in robustness and image quality in challenging real-world
settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ),
but our algorithm can also be applied in the future to other categories with a
3D-aware image generator. |
---|---|
DOI: | 10.48550/arxiv.2305.02310 |