World-Grounded Human Motion Recovery via Gravity-View Coordinates
We present a novel method for recovering world-grounded human motion from monocular video. The main challenge lies in the ambiguity of defining the world coordinate system, which varies between sequences. Previous approaches attempt to alleviate this issue by predicting relative motion in an autoreg...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a novel method for recovering world-grounded human motion from
monocular video. The main challenge lies in the ambiguity of defining the world
coordinate system, which varies between sequences. Previous approaches attempt
to alleviate this issue by predicting relative motion in an autoregressive
manner, but are prone to accumulating errors. Instead, we propose estimating
human poses in a novel Gravity-View (GV) coordinate system, which is defined by
the world gravity and the camera view direction. The proposed GV system is
naturally gravity-aligned and uniquely defined for each video frame, largely
reducing the ambiguity of learning image-pose mapping. The estimated poses can
be transformed back to the world coordinate system using camera rotations,
forming a global motion sequence. Additionally, the per-frame estimation avoids
error accumulation in the autoregressive methods. Experiments on in-the-wild
benchmarks demonstrate that our method recovers more realistic motion in both
the camera space and world-grounded settings, outperforming state-of-the-art
methods in both accuracy and speed. The code is available at
https://zju3dv.github.io/gvhmr/. |
---|---|
DOI: | 10.48550/arxiv.2409.06662 |