Self-supervised Visual-LiDAR Odometry with Flip Consistency

Most learning-based methods estimate ego-motion by utilizing visual sensors, which suffer from dramatic lighting variations and textureless scenarios. In this paper, we incorporate sparse but accurate depth measurements obtained from lidars to overcome the limitation of visual methods. To this end,...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Li, Bin, Hu, Mu, Wang, Shuling, Wang, Lianghao, Gong, Xiaojin
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 05.01.2021
Subjects	Coders Consistency Decoders Feature extraction Ground truth Lidar Supervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Most learning-based methods estimate ego-motion by utilizing visual sensors, which suffer from dramatic lighting variations and textureless scenarios. In this paper, we incorporate sparse but accurate depth measurements obtained from lidars to overcome the limitation of visual methods. To this end, we design a self-supervised visual-lidar odometry (Self-VLO) framework. It takes both monocular images and sparse depth maps projected from 3D lidar points as input, and produces pose and depth estimations in an end-to-end learning manner, without using any ground truth labels. To effectively fuse two modalities, we design a two-pathway encoder to extract features from visual and depth images and fuse the encoded features with those in decoders at multiple scales by our fusion module. We also adopt a siamese architecture and design an adaptively weighted flip consistency loss to facilitate the self-supervised learning of our VLO. Experiments on the KITTI odometry benchmark show that the proposed approach outperforms all self-supervised visual or lidar odometries. It also performs better than fully supervised VOs, demonstrating the power of fusion.
ISSN:	2331-8422