Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cann...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) pp. 1 - 7
Main Authors Wen, Lu, Duan, Jingliang, Li, Shengbo Eben, Xu, Shaobing, Peng, Huei
Format Conference Proceeding
LanguageEnglish
Published IEEE 20.09.2020
Subjects
Online AccessGet full text
DOI10.1109/ITSC45102.2020.9294262

Cover

Abstract Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning.
AbstractList Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning.
Author Duan, Jingliang
Xu, Shaobing
Peng, Huei
Li, Shengbo Eben
Wen, Lu
Author_xml – sequence: 1
  givenname: Lu
  surname: Wen
  fullname: Wen, Lu
  email: lulwen@umich.edu
  organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084
– sequence: 2
  givenname: Jingliang
  surname: Duan
  fullname: Duan, Jingliang
  organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084
– sequence: 3
  givenname: Shengbo Eben
  surname: Li
  fullname: Li, Shengbo Eben
  email: lisb04@gmail.com
  organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084
– sequence: 4
  givenname: Shaobing
  surname: Xu
  fullname: Xu, Shaobing
  organization: University of Michigan,Department of Mechanical Engineering,Ann Arbor,MI,USA,48105
– sequence: 5
  givenname: Huei
  surname: Peng
  fullname: Peng, Huei
  organization: University of Michigan,Department of Mechanical Engineering,Ann Arbor,MI,USA,48105
BookMark eNotj11rwjAYRjPYLqbbLxiM_IG6JI1pcyllH0JBmW638jZ9YwNtIml64X79BL164Bw48MzIvQ8eCXnlbME502_r_a6SS87EQjDBFlpoKZS4IzNeiJJLLrV8JO0OLNJvdN6GaHBAn2iNEL3zR3pBdDWl4MMQppH-YudMjyNNXQzTsaNbiND32NMq-DFFcB5bug29M2e6OSU3uD9ILvgn8mChH_H5tnPy8_G-r76yevO5rlZ15oTSKTMNqgZskVtZaFEaznBpkbfKCKG1AVCFuUheskYBawrJcq2tgJxJ1WJZ5HPycu06RDycohsgng-34_k_z2lVaA
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ITSC45102.2020.9294262
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728141494
9781728141497
EndPage 7
ExternalDocumentID 9294262
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-i269t-cbe6baf73f47928c10e5fe1d6c2299caa67c73f180b6a0b740399f2a3046de873
IEDL.DBID RIE
IngestDate Thu Jun 29 18:36:59 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-cbe6baf73f47928c10e5fe1d6c2299caa67c73f180b6a0b740399f2a3046de873
PageCount 7
ParticipantIDs ieee_primary_9294262
PublicationCentury 2000
PublicationDate 2020-09-20
PublicationDateYYYYMMDD 2020-09-20
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-20
  day: 20
PublicationDecade 2020
PublicationTitle 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)
PublicationTitleAbbrev ITSC
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9931651
Snippet Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems....
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Artificial neural networks
Autonomous vehicles
Linear programming
Optimization
Reinforcement learning
Safety
Security
Title Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization
URI https://ieeexplore.ieee.org/document/9294262
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwMhECW1J09qWuN3OHh0tyylLBxNY1NNqo1tTW8NsIMatTXN7sVfL7BYo_HgbcOSQGDCDMO89xA6ZwaoAGuTrlDaXVCYToTwDLRZTxsXjzjD9qmB0S0fztjNvDdvoIsNFgYAQvEZpP4zvOUXK1P5VFnHuXJPoL6FtpyZ1VitCPrNiOxcTyd95kzMw6soSWPnH6opwWkMdtDoa7i6VuQlrUqdmo9fTIz_nc8uan_D8_B443j2UAOWLVRMlAV8D4EJ1YSkH47kqY_YNeHLqvQABnfTxw_wFKrhcFTpwWO19poqr9jrdwbVCChwTRmM79yh8hbRmm00G1xN-8MkSigkz5TLMjEauFY271qWSypMRqBnISu4oc4PGaV4btzPTBDNFdE5Iy5gsVT599ICRN7dR83lagkHCFsfTTCpmZaGFZmRXBOTc6BuCCUJPUQtv0KL95olYxEX5-jv5mO07XfJV15QcoKa5bqCU-feS30W9vUTBDWovw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0QD3pSA8bf9uDRja503XY0RAIKSAQMN9J2X9Wow5Dt4l9v202MxoO3pWvSpW32vn793nsIXTAFNAatvXYspDmgMOnFsVWgDUKpTDxiNrZNDQxHvDdjN_NwXkOXay4MALjiM_Dto7vLT5eqsKmyloFyK6C-gTYN7rOwZGtVtN-AJK3-dNJhZpNZghUlftX9h2-Kg43uDhp-DVhWi7z4RS599fFLi_G_X7SLmt8EPTxeQ88eqkHWQOlEaMD34LRQlUv74Uo-9RGbJnxV5JbCYM76-AGeXD0crnx68FisrKvKK7YOns43AlJcigbjO_Nbeav4mk00615POz2vMlHwnilPck9J4FLoqK1ZlNBYBQRCDUHKFTVIpITgkTIvg5hILoiMGDEhi6bC3pimEEftfVTPlhkcIKxtPMESyWSiWBqohEuiIg7UDCESQg9Rw87Q4r3UyVhUk3P0d_M52upNh4PFoD-6PUbbdsVsHQYlJ6ierwo4NWCfyzO3xp8FLawM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+23rd+International+Conference+on+Intelligent+Transportation+Systems+%28ITSC%29&rft.atitle=Safe+Reinforcement+Learning+for+Autonomous+Vehicles+through+Parallel+Constrained+Policy+Optimization&rft.au=Wen%2C+Lu&rft.au=Duan%2C+Jingliang&rft.au=Li%2C+Shengbo+Eben&rft.au=Xu%2C+Shaobing&rft.date=2020-09-20&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FITSC45102.2020.9294262&rft.externalDocID=9294262