Warning: Full texts from electronic resources are only available from the university network. You are currently outside this network. Please log in to access full texts.
Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos
Recent works on dynamic 3D neural field reconstruction assume the input from synchronized multi-view videos whose poses are known. The input constraints are often not satisfied in real-world setups, making the approach impractical. We show that unsynchronized videos from unknown poses can generate d...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
26.12.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2412.19089 |
Cover
Summary: | Recent works on dynamic 3D neural field reconstruction assume the input from
synchronized multi-view videos whose poses are known. The input constraints are
often not satisfied in real-world setups, making the approach impractical. We
show that unsynchronized videos from unknown poses can generate dynamic neural
fields as long as the videos capture human motion. Humans are one of the most
common dynamic subjects captured in videos, and their shapes and poses can be
estimated using state-of-the-art libraries. While noisy, the estimated human
shape and pose parameters provide a decent initialization point to start the
highly non-convex and under-constrained problem of training a consistent
dynamic neural representation. Given the shape and pose parameters of humans in
individual frames, we formulate methods to calculate the time offsets between
videos, followed by camera pose estimations that analyze the 3D joint
positions. Then, we train the dynamic neural fields employing multiresolution
grids while we concurrently refine both time offsets and camera poses. The
setup still involves optimizing many parameters; therefore, we introduce a
robust progressive learning strategy to stabilize the process. Experiments show
that our approach achieves accurate spatio-temporal calibration and
high-quality scene reconstruction in challenging conditions. |
---|---|
DOI: | 10.48550/arxiv.2412.19089 |