FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. Due to the large variations in body shapes, poses, and cloth types, existing methods mostly require hours of per-subject optimization during inference, which limits their pract...
Saved in:
Main Authors | , , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.03.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a novel method for reconstructing personalized 3D human avatars
with realistic animation from only a few images. Due to the large variations in
body shapes, poses, and cloth types, existing methods mostly require hours of
per-subject optimization during inference, which limits their practical
applications. In contrast, we learn a universal prior from over a thousand
clothed humans to achieve instant feedforward generation and zero-shot
generalization. Specifically, instead of rigging the avatar with shared
skinning weights, we jointly infer personalized avatar shape, skinning weights,
and pose-dependent deformations, which effectively improves overall geometric
fidelity and reduces deformation artifacts. Moreover, to normalize pose
variations and resolve coupled ambiguity between canonical shapes and skinning
weights, we design a 3D canonicalization process to produce pixel-aligned
initial conditions, which helps to reconstruct fine-grained geometric details.
We then propose a multi-frame feature aggregation to robustly reduce artifacts
introduced in canonicalization and fuse a plausible avatar preserving
person-specific identities. Finally, we train the model in an end-to-end
framework on a large-scale capture dataset, which contains diverse human
subjects paired with high-quality 3D scans. Extensive experiments show that our
method generates more authentic reconstruction and animation than
state-of-the-arts, and can be directly generalized to inputs from casually
taken phone photos. Project page and code is available at
https://github.com/rongakowang/FRESA. |
---|---|
DOI: | 10.48550/arxiv.2503.19207 |