UniDepth: Universal Monocular Metric Depth Estimation
Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate d...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.03.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Accurate monocular metric depth estimation (MMDE) is crucial to solving
downstream tasks in 3D perception and modeling. However, the remarkable
accuracy of recent MMDE methods is confined to their training domains. These
methods fail to generalize to unseen domains even in the presence of moderate
domain gaps, which hinders their practical applicability. We propose a new
model, UniDepth, capable of reconstructing metric 3D scenes from solely single
images across domains. Departing from the existing MMDE methods, UniDepth
directly predicts metric 3D points from the input image at inference time
without any additional information, striving for a universal and flexible MMDE
solution. In particular, UniDepth implements a self-promptable camera module
predicting dense camera representation to condition depth features. Our model
exploits a pseudo-spherical output representation, which disentangles camera
and depth representations. In addition, we propose a geometric invariance loss
that promotes the invariance of camera-prompted depth features. Thorough
evaluations on ten datasets in a zero-shot regime consistently demonstrate the
superior performance of UniDepth, even when compared with methods directly
trained on the testing domains. Code and models are available at:
https://github.com/lpiccinelli-eth/unidepth |
---|---|
DOI: | 10.48550/arxiv.2403.18913 |