Pix2Map: Cross-Modal Retrieval for Inferring Street Maps from Images

Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 17514 - 17523
Main Authors	Wu, Xindi, Lau, KwunFung, Ferroni, Francesco, Osep, Aljosa, Ramanan, Deva
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2023
Subjects	Cameras Image retrieval Layout Location awareness Multi-modal learning Roads Topology Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that this problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps, represented as discrete graphs that encode the topological layout of the visual surroundings. We conduct our experimental evaluation using the Argoverse dataset and show that it is indeed possible to accurately retrieve street maps corresponding to both seen and unseen roads solely from image data. Moreover, we show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results for visual localization and image retrieval from spatial graphs.
ISSN:	2575-7075
DOI:	10.1109/CVPR52729.2023.01680