Transformer guided geometry model for flow-based unsupervised visual odometry

Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method con...

Full description

Saved in:

Bibliographic Details
Published in	Neural computing & applications Vol. 33; no. 13; pp. 8031 - 8042
Main Authors	Li, Xiangyu, Hou, Yonghong, Wang, Pichao, Gao, Zhimin, Xu, Mingliang, Li, Wanqing
Format	Journal Article
Language	English
Published	London Springer London 01.07.2021 Springer Nature B.V
Subjects	Artificial Intelligence Cameras Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Datasets Deep learning Estimators Geometry Image Processing and Computer Vision Methods Neural networks Original Article Probability and Statistics in Computer Science Recurrent neural networks Robotics Training Transformers Transformer Geometry constraint Unsupervised learning Visual odometry
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Existing unsupervised visual odometry (VO) methods either match pairwise images or integrate the temporal information using recurrent neural networks over a long sequence of images. They are either not accurate, time-consuming in training or error accumulative. In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images, respectively. For image sequences, a transformer-like structure is adopted to build a geometry model over a local temporal window, referred to as transformer-based auxiliary pose estimator (TAPE). Meanwhile, a flow-to-flow pose estimator (F2FPE) is proposed to exploit the relationship between pairwise images. The two estimators are constrained through a simple yet effective consistency loss in training. Empirical evaluation has shown that the proposed method outperforms the state-of-the-art unsupervised learning-based methods by a large margin and performs comparably to supervised and traditional ones on the KITTI and Malaga dataset.
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-020-05545-8