ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning

Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-b...

Full description

Saved in:

Bibliographic Details
Published in	Computer Vision - ECCV 2022 Vol. 13698; pp. 533 - 549
Main Authors	Hu, Shengchao, Chen, Li, Wu, Penghao, Li, Hongyang, Yan, Junchi, Tao, Dacheng
Format	Book Chapter
Language	English
Published	Switzerland Springer 01.01.2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.
Bibliography:	Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-19839-7_31. S. Hu and P. Wu—Work done during internship at Shanghai AI Laboratory.
ISBN:	9783031198380 3031198387
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-19839-7_31