ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning

Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-b...

Full description

Saved in:
Bibliographic Details
Published inComputer Vision - ECCV 2022 Vol. 13698; pp. 533 - 549
Main Authors Hu, Shengchao, Chen, Li, Wu, Penghao, Li, Hongyang, Yan, Junchi, Tao, Dacheng
Format Book Chapter
LanguageEnglish
Published Switzerland Springer 01.01.2022
Springer Nature Switzerland
SeriesLecture Notes in Computer Science
Online AccessGet full text

Cover

Loading…
Abstract Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.
AbstractList Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.
Author Yan, Junchi
Tao, Dacheng
Wu, Penghao
Li, Hongyang
Chen, Li
Hu, Shengchao
Author_xml – sequence: 1
  givenname: Shengchao
  surname: Hu
  fullname: Hu, Shengchao
– sequence: 2
  givenname: Li
  surname: Chen
  fullname: Chen, Li
  email: lichen@pjlab.org.cn
– sequence: 3
  givenname: Penghao
  surname: Wu
  fullname: Wu, Penghao
– sequence: 4
  givenname: Hongyang
  surname: Li
  fullname: Li, Hongyang
– sequence: 5
  givenname: Junchi
  surname: Yan
  fullname: Yan, Junchi
– sequence: 6
  givenname: Dacheng
  surname: Tao
  fullname: Tao, Dacheng
BookMark eNo1kMtOwzAQRc1TtNA_YJEfMMx47LhmxxukSiBRYGk5jQ2BEoc45ftxeayuNPfeGc0Zs-02tp6xQ4QjBNDHRk85cSDkaKZkuLaEG2xMefIzUJtshCUiJ5Jmi01y_t-bwjYbAYHgRkvaZWMkhUJhqco9NknpDQCEzlmCEXt-mPN7Oiku25oPkWcpnprUxJafueTr4nQ1xDZ-xFUqLvrmq2lfiq_GFQ-dGxq35HP_0cXeLYsr74ZV74uZd32bUwdsJ7hl8pM_3WePV5fz8xs-u7u-PT-d8TcCM_C6wgrKAE5J8pWTuFDSZ8fU2mkVQlWZgEYLEFUIoiRtNMoQFqoSsp7m9_aZ-N2buj6f9b2tYnxPFsGuOdrMxZLNZOwPNrvmmEvyt9T18XPl02D9urXw7ZB_Wby6bvB9shoF6JKs1MoqA_QNASBx7Q
ContentType Book Chapter
Copyright The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Copyright_xml – notice: The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
DBID FFUUA
DEWEY 006.37
DOI 10.1007/978-3-031-19839-7_31
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 3031198395
9783031198397
EISSN 1611-3349
Editor Farinella, Giovanni Maria
Avidan, Shai
Cissé, Moustapha
Brostow, Gabriel
Hassner, Tal
Editor_xml – sequence: 1
  fullname: Avidan, Shai
– sequence: 2
  fullname: Cissé, Moustapha
– sequence: 3
  fullname: Farinella, Giovanni Maria
– sequence: 4
  fullname: Brostow, Gabriel
– sequence: 5
  fullname: Hassner, Tal
EndPage 549
ExternalDocumentID EBC7120763_475_590
GroupedDBID 38.
AABBV
AAZWU
ABSVR
ABTHU
ABVND
ACBPT
ACHZO
ACPMC
ADNVS
AEDXK
AEJLV
AEKFX
AHVRR
ALMA_UNASSIGNED_HOLDINGS
BBABE
CZZ
FFUUA
IEZ
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-~X
29L
2HA
2HV
ACGFS
ADCXD
EJD
F5P
LAS
LDH
P2P
RSU
~02
ID FETCH-LOGICAL-j309t-db1b06f0a543eba41c54e3099d7a75ffbb9f197202bff26379714ffc5b24d8743
ISBN 9783031198380
3031198387
ISSN 0302-9743
IngestDate Tue Jul 29 20:14:13 EDT 2025
Thu May 29 01:35:39 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum TA1634
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-j309t-db1b06f0a543eba41c54e3099d7a75ffbb9f197202bff26379714ffc5b24d8743
Notes Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-19839-7_31.
S. Hu and P. Wu—Work done during internship at Shanghai AI Laboratory.
OCLC 1351251656
PQID EBC7120763_475_590
PageCount 17
ParticipantIDs springer_books_10_1007_978_3_031_19839_7_31
proquest_ebookcentralchapters_7120763_475_590
PublicationCentury 2000
PublicationDate 2022-01-01
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVIII
PublicationTitle Computer Vision - ECCV 2022
PublicationYear 2022
Publisher Springer
Springer Nature Switzerland
Publisher_xml – name: Springer
– name: Springer Nature Switzerland
RelatedPersons Hartmanis, Juris
Gao, Wen
Steffen, Bernhard
Bertino, Elisa
Goos, Gerhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002731130
ssj0002792
Score 2.5557544
Snippet Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an...
SourceID springer
proquest
SourceType Publisher
StartPage 533
Title ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7120763&ppg=590
http://link.springer.com/10.1007/978-3-031-19839-7_31
Volume 13698
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1Nb9Mw1CrlMnHgW4wv5cBOkVES23GCxKGUommqJqR1VW9RHNtbJ5RKawpif4ULv4VfxnNsN2nZZVzSxols572n9-X3gdC7iGQpiZXElPMc00ylWEieYk0rRXWSg1LXRlucpsfn9GTBFoPBr17U0qYR76ubW_NK_gerMAZ4NVmyd8DsdlIYgP-AX7gChuG6p_zuulltXQHXjyGct-nhoQ9bIOFkPJ6HSZQkHc5aJ-elqi-qy3LVHelbnjNdblnzxsbs1he916a2sfWqvvhZOjnnaOxshr8S41OY1BI3Kww_bjf4E0hHGY42jUmaMGG2n6-Xre_i-7I8GidHo8g0QwYQ4ZktjvUtNNqoOc5wJV_tSgaOav1x6o46TldNG0EWbr_eMae-9yJJ9rwX3ntpQrPNLGc_ls2NTXPeMXRB0MZxnhHb9cknfAEzB3PI8kdl-XdqqjISWwXV8WRGSE-8M_vsH8nRDxaBxbBZLce8MDn693jGhuj-aHIynW8deKD3xa3tdeDvc3dkZXdlEon8rrkt9dR9RS-J87Yld8ydvRP6VvGZPUIPTDJMYLJUANiP0UDVT9BDZ7oEDvZrGPL48GNP0aKljg9BRxtBnzaCjjYCRxsB0Maf3_t0ETi6CDxdPEPnXyaz8TF2fTzwFYnyBksRiyjVUckoUaKkccWogie55CVnWguRa9P9LkqE1klKeM5jqnXFREJlBqB8job1qlYvUMDAutdxxjWByUQKvEUzLqWOJK8yyeNDhD3gijbawIU4VxZM64LHSQQitaCcFSyPDlHooVuY19eFL-MNaClIAWgpWrQUBi0v7_T2K3TQUfxrNGyuN-oNaLCNeOto6S9_tI2X
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Computer+Vision+%E2%80%93+ECCV+2022&rft.au=Hu%2C+Shengchao&rft.au=Chen%2C+Li&rft.au=Wu%2C+Penghao&rft.au=Li%2C+Hongyang&rft.atitle=ST-P3%3A+End-to-End+Vision-Based+Autonomous+Driving+via%C2%A0Spatial-Temporal+Feature+Learning&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-01-01&rft.pub=Springer+Nature+Switzerland&rft.isbn=9783031198380&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=533&rft.epage=549&rft_id=info:doi/10.1007%2F978-3-031-19839-7_31
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7120763-l.jpg