HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs

Recent neural human representations can produce high-quality multi-view rendering but require using dense multi-view inputs and costly training. They are hence largely limited to static models as training each frame is infeasible. We present HumanNeRF - a neural representation with efficient general...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 7733 - 7743
Main Authors	Zhao, Fuqiang, Yang, Wei, Zhang, Jiakai, Lin, Pei, Zhang, Yingliang, Yu, Jingyi, Xu, Lan
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2022
Subjects	Cameras Computer vision Dynamics Entertainment industry Image and video synthesis and generation; 3D from multi-view and sensors; Face and gestures; Motion and tracking; Pose estimation and tracking Rendering (computer graphics) Telepresence Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent neural human representations can produce high-quality multi-view rendering but require using dense multi-view inputs and costly training. They are hence largely limited to static models as training each frame is infeasible. We present HumanNeRF - a neural representation with efficient generalization ability - for high-fidelity free-view synthesis of dynamic humans. Analogous to how IBRNet assists NeRF by avoiding perscene training, HumanNeRF employs an aggregated pixel-alignment feature across multi-view inputs along with a pose embedded non-rigid deformation field for tackling dynamic motions. The raw Human-NeRF can already produce reasonable rendering on sparse video inputs of unseen subjects and camera settings. To further improve the rendering quality, we augment our solution with in-hour scene-specific fine-tuning, and an appearance blending module for combining the benefits of both neural volumetric rendering and neural texture blending. Extensive experiments on various multi-view dynamic hu-man datasets demonstrate effectiveness of our approach in synthesizing photo-realistic free-view humans under challenging motions and with very sparse camera view inputs.
ISSN:	1063-6919
DOI:	10.1109/CVPR52688.2022.00759