LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

Vision-based depth estimation is a key feature in autonomous systems, which often relies on a single camera or several independent ones. In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e.g., with 64 beams, or camera-only methods,...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 227; p. 103601
Main Authors	Bartoccioni, Florent, Zablocki, Éloi, Pérez, Patrick, Cord, Matthieu, Alahari, Karteek
Format	Journal Article
Language	English
Published	Elsevier Inc 01.01.2023 Elsevier
Subjects	3D scene understanding Computer Science Computer Vision and Pattern Recognition Depth estimation Minimal LiDAR Self-supervised 68T45 Self-supervised Minimal LiDAR Depth estimation 3D scene understanding
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Vision-based depth estimation is a key feature in autonomous systems, which often relies on a single camera or several independent ones. In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e.g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems. In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of today’s automotive-grade mass-produced laser scanners. Inspired by recent self-supervised methods, we introduce a novel framework, called LiDARTouch, to estimate dense depth maps from monocular images with the help of “touches” of LiDAR, i.e., without the need for dense ground-truth depth. In our setup, the minimal LiDAR input contributes on three different levels: as an additional model’s input, in a self-supervised LiDAR reconstruction objective function, and to estimate changes of pose (a key component of self-supervised depth estimation architectures). Our LiDARTouch framework achieves new state of the art in self-supervised depth estimation on the KITTI dataset, thus supporting our choices of integrating the very sparse LiDAR signal with other visual features. Moreover, we show that the use of a few-beam LiDAR alleviates scale ambiguity and infinite-depth issues that camera-only methods suffer from. We also demonstrate that methods from the fully-supervised depth-completion literature can be adapted to a self-supervised regime with a minimal LiDAR signal. [Display omitted] •4-beam LiDAR, only LiDAR currently in consumer-grade vehicles, for depth estimation.•Integrating few-beam LiDARs alleviate the scale-ambiguity and infinite-depth issues.•Self-supervised model for metric and accurate depth estimation on any domain.•We obtain state-of-the-art results on the self-supervised depth estimation task.•Extensive study on the influence of LiDAR in input, pose estimation and supervision.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2022.103601