SelfD: Self-Learning Large-Scale Driving Policies From the Web

Effectively utilizing the vast amounts of ego-centric navigation data that is freely available on the internet can advance generalized intelligent systems, i.e., to robustly scale across perspectives, platforms, environmental conditions, scenarios, and geographical locations. However, it is difficul...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 17295 - 17305
Main Authors	Zhang, Jimuyang, Zhu, Ruizhao, Ohn-Bar, Eshed
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2022
Subjects	Annotations Cameras Computer vision Data collection Navigation Navigation and autonomous driving; Robot vision; Self-& semi-& meta- Vision applications and systems Three-dimensional displays Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Effectively utilizing the vast amounts of ego-centric navigation data that is freely available on the internet can advance generalized intelligent systems, i.e., to robustly scale across perspectives, platforms, environmental conditions, scenarios, and geographical locations. However, it is difficult to directly leverage such large amounts of unlabeled and highly diverse datafor complex 3D reasoning and planning tasks. Consequently, researchers have primarily focused on its use for various auxiliary pixel- and image-level computer vision tasks that do not consider an ultimate navigational objective. In this work, we introduce SelfD, a framework for learning scalable driving by utilizing large amounts of online monocular images. Our key idea is to leverage iterative semi-supervised training when learning imitative agents from unlabeled data. To handle unconstrained viewpoints, scenes, and camera parameters, we train an image-based model that directly learns to plan in the Bird's Eye View (BEV) space. Next, we use unla-beled data to augment the decision-making knowledge and robustness of an initially trained model via self-training. In particular, we propose a pseudo-labeling step which enables making full use of highly diverse demonstration data through "hypothetical" planning-based data augmentation. We employ a large dataset of publicly available YouTube videos to train SelfD and comprehensively analyze its generalization benefits across challenging navigation scenarios. Without requiring any additional data collection or annotation efforts, SelfD demonstrates consistent improvements (by up to 24%) in driving performance evaluation on nuScenes, Argoverse, Waymo, and CARLA.
ISSN:	2575-7075
DOI:	10.1109/CVPR52688.2022.01680