ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning

Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-b...

Full description

Saved in:

Bibliographic Details
Published in	Computer Vision - ECCV 2022 Vol. 13698; pp. 533 - 549
Main Authors	Hu, Shengchao, Chen, Li, Wu, Penghao, Li, Hongyang, Yan, Junchi, Tao, Dacheng
Format	Book Chapter
Language	English
Published	Switzerland Springer 01.01.2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Online Access	Get full text

Cover

Loading…

Abstract	Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.
AbstractList	Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.
Author	Yan, Junchi Tao, Dacheng Wu, Penghao Li, Hongyang Chen, Li Hu, Shengchao
Author_xml	– sequence: 1 givenname: Shengchao surname: Hu fullname: Hu, Shengchao – sequence: 2 givenname: Li surname: Chen fullname: Chen, Li email: lichen@pjlab.org.cn – sequence: 3 givenname: Penghao surname: Wu fullname: Wu, Penghao – sequence: 4 givenname: Hongyang surname: Li fullname: Li, Hongyang – sequence: 5 givenname: Junchi surname: Yan fullname: Yan, Junchi – sequence: 6 givenname: Dacheng surname: Tao fullname: Tao, Dacheng
BookMark	eNo1kMtOwzAQRc1TtNA_YJEfMMx47LhmxxukSiBRYGk5jQ2BEoc45ftxeayuNPfeGc0Zs-02tp6xQ4QjBNDHRk85cSDkaKZkuLaEG2xMefIzUJtshCUiJ5Jmi01y_t-bwjYbAYHgRkvaZWMkhUJhqco9NknpDQCEzlmCEXt-mPN7Oiku25oPkWcpnprUxJafueTr4nQ1xDZ-xFUqLvrmq2lfiq_GFQ-dGxq35HP_0cXeLYsr74ZV74uZd32bUwdsJ7hl8pM_3WePV5fz8xs-u7u-PT-d8TcCM_C6wgrKAE5J8pWTuFDSZ8fU2mkVQlWZgEYLEFUIoiRtNMoQFqoSsp7m9_aZ-N2buj6f9b2tYnxPFsGuOdrMxZLNZOwPNrvmmEvyt9T18XPl02D9urXw7ZB_Wby6bvB9shoF6JKs1MoqA_QNASBx7Q
ContentType	Book Chapter
Copyright	The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Copyright_xml	– notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
DBID	FFUUA
DEWEY	006.37
DOI	10.1007/978-3-031-19839-7_31
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISBN	3031198395 9783031198397
EISSN	1611-3349
Editor	Farinella, Giovanni Maria Avidan, Shai Cissé, Moustapha Brostow, Gabriel Hassner, Tal
Editor_xml	– sequence: 1 fullname: Avidan, Shai – sequence: 2 fullname: Cissé, Moustapha – sequence: 3 fullname: Farinella, Giovanni Maria – sequence: 4 fullname: Brostow, Gabriel – sequence: 5 fullname: Hassner, Tal
EndPage	549
ExternalDocumentID	EBC7120763_475_590
GroupedDBID	38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -~X 29L 2HA 2HV ACGFS ADCXD EJD F5P LAS LDH P2P RSU ~02
ID	FETCH-LOGICAL-j309t-db1b06f0a543eba41c54e3099d7a75ffbb9f197202bff26379714ffc5b24d8743
ISBN	9783031198380 3031198387
ISSN	0302-9743
IngestDate	Tue Jul 29 20:14:13 EDT 2025 Thu May 29 01:35:39 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	TA1634
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-j309t-db1b06f0a543eba41c54e3099d7a75ffbb9f197202bff26379714ffc5b24d8743
Notes	Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-19839-7_31. S. Hu and P. Wu—Work done during internship at Shanghai AI Laboratory.
OCLC	1351251656
PQID	EBC7120763_475_590
PageCount	17
ParticipantIDs	springer_books_10_1007_978_3_031_19839_7_31 proquest_ebookcentralchapters_7120763_475_590
PublicationCentury	2000
PublicationDate	2022-01-01
PublicationDateYYYYMMDD	2022-01-01
PublicationDate_xml	– month: 01 year: 2022 text: 2022-01-01 day: 01
PublicationDecade	2020
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVIII
PublicationTitle	Computer Vision - ECCV 2022
PublicationYear	2022
Publisher	Springer Springer Nature Switzerland
Publisher_xml	– name: Springer – name: Springer Nature Switzerland
RelatedPersons	Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti
RelatedPersons_xml	– sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti
SSID	ssj0002731130 ssj0002792
Score	2.5557544
Snippet	Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an...
SourceID	springer proquest
SourceType	Publisher
StartPage	533
Title	ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7120763&ppg=590 http://link.springer.com/10.1007/978-3-031-19839-7_31
Volume	13698
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1Nb9Mw1CrlMnHgW4wv5cBOkVES23GCxKGUommqJqR1VW9RHNtbJ5RKawpif4ULv4VfxnNsN2nZZVzSxols572n9-X3gdC7iGQpiZXElPMc00ylWEieYk0rRXWSg1LXRlucpsfn9GTBFoPBr17U0qYR76ubW_NK_gerMAZ4NVmyd8DsdlIYgP-AX7gChuG6p_zuulltXQHXjyGct-nhoQ9bIOFkPJ6HSZQkHc5aJ-elqi-qy3LVHelbnjNdblnzxsbs1he916a2sfWqvvhZOjnnaOxshr8S41OY1BI3Kww_bjf4E0hHGY42jUmaMGG2n6-Xre_i-7I8GidHo8g0QwYQ4ZktjvUtNNqoOc5wJV_tSgaOav1x6o46TldNG0EWbr_eMae-9yJJ9rwX3ntpQrPNLGc_ls2NTXPeMXRB0MZxnhHb9cknfAEzB3PI8kdl-XdqqjISWwXV8WRGSE-8M_vsH8nRDxaBxbBZLce8MDn693jGhuj-aHIynW8deKD3xa3tdeDvc3dkZXdlEon8rrkt9dR9RS-J87Yld8ydvRP6VvGZPUIPTDJMYLJUANiP0UDVT9BDZ7oEDvZrGPL48GNP0aKljg9BRxtBnzaCjjYCRxsB0Maf3_t0ETi6CDxdPEPnXyaz8TF2fTzwFYnyBksRiyjVUckoUaKkccWogie55CVnWguRa9P9LkqE1klKeM5jqnXFREJlBqB8job1qlYvUMDAutdxxjWByUQKvEUzLqWOJK8yyeNDhD3gijbawIU4VxZM64LHSQQitaCcFSyPDlHooVuY19eFL-MNaClIAWgpWrQUBi0v7_T2K3TQUfxrNGyuN-oNaLCNeOto6S9_tI2X
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Vision+%E2%80%93+ECCV+2022&rft.au=Hu%2C+Shengchao&rft.au=Chen%2C+Li&rft.au=Wu%2C+Penghao&rft.au=Li%2C+Hongyang&rft.atitle=ST-P3%3A+End-to-End+Vision-Based+Autonomous+Driving+via%C2%A0Spatial-Temporal+Feature+Learning&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-01-01&rft.pub=Springer+Nature+Switzerland&rft.isbn=9783031198380&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=533&rft.epage=549&rft_id=info:doi/10.1007%2F978-3-031-19839-7_31
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7120763-l.jpg