FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their pract...

Full description

Saved in:
Bibliographic Details
Main Authors Wang, Rong, Prada, Fabian, Wang, Ziyan, Jiang, Zhongshi, Yin, Chengxiang, Li, Junxuan, Saito, Shunsuke, Santesteban, Igor, Romero, Javier, Joshi, Rohan, Li, Hongdong, Saragih, Jason, Sheikh, Yaser
Format Journal Article
LanguageEnglish
Published 24.03.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their practical applications. In contrast, we learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Specifically, instead of rigging the avatar with shared skinning weights, we jointly infer personalized avatar shape, skinning weights, and pose-dependent deformations, which effectively improves overall geometric fidelity and reduces deformation artifacts. Moreover, to normalize pose variations and resolve coupled ambiguity between canonical shapes and skinning weights, we design a 3D canonicalization process to produce pixel-aligned initial conditions, which helps to reconstruct fine-grained geometric details. We then propose a multi-frame feature aggregation to robustly reduce artifacts introduced in canonicalization and fuse a plausible avatar preserving person-specific identities. Finally, we train the model in an end-to-end framework on a large-scale capture dataset, which contains diverse human subjects paired with high-quality 3D scans. Extensive experiments show that our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos. Project page and code is available at https://github.com/rongakowang/FRESA.
DOI:10.48550/arxiv.2503.19207