Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization
Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cann...
Saved in:
Published in | 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) pp. 1 - 7 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
20.09.2020
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ITSC45102.2020.9294262 |
Cover
Abstract | Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning. |
---|---|
AbstractList | Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning. |
Author | Duan, Jingliang Xu, Shaobing Peng, Huei Li, Shengbo Eben Wen, Lu |
Author_xml | – sequence: 1 givenname: Lu surname: Wen fullname: Wen, Lu email: lulwen@umich.edu organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084 – sequence: 2 givenname: Jingliang surname: Duan fullname: Duan, Jingliang organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084 – sequence: 3 givenname: Shengbo Eben surname: Li fullname: Li, Shengbo Eben email: lisb04@gmail.com organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084 – sequence: 4 givenname: Shaobing surname: Xu fullname: Xu, Shaobing organization: University of Michigan,Department of Mechanical Engineering,Ann Arbor,MI,USA,48105 – sequence: 5 givenname: Huei surname: Peng fullname: Peng, Huei organization: University of Michigan,Department of Mechanical Engineering,Ann Arbor,MI,USA,48105 |
BookMark | eNotj11rwjAYRjPYLqbbLxiM_IG6JI1pcyllH0JBmW638jZ9YwNtIml64X79BL164Bw48MzIvQ8eCXnlbME502_r_a6SS87EQjDBFlpoKZS4IzNeiJJLLrV8JO0OLNJvdN6GaHBAn2iNEL3zR3pBdDWl4MMQppH-YudMjyNNXQzTsaNbiND32NMq-DFFcB5bug29M2e6OSU3uD9ILvgn8mChH_H5tnPy8_G-r76yevO5rlZ15oTSKTMNqgZskVtZaFEaznBpkbfKCKG1AVCFuUheskYBawrJcq2tgJxJ1WJZ5HPycu06RDycohsgng-34_k_z2lVaA |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ITSC45102.2020.9294262 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1728141494 9781728141497 |
EndPage | 7 |
ExternalDocumentID | 9294262 |
Genre | orig-research |
GroupedDBID | 6IE 6IH CBEJK RIE RIO |
ID | FETCH-LOGICAL-i269t-cbe6baf73f47928c10e5fe1d6c2299caa67c73f180b6a0b740399f2a3046de873 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:36:59 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i269t-cbe6baf73f47928c10e5fe1d6c2299caa67c73f180b6a0b740399f2a3046de873 |
PageCount | 7 |
ParticipantIDs | ieee_primary_9294262 |
PublicationCentury | 2000 |
PublicationDate | 2020-09-20 |
PublicationDateYYYYMMDD | 2020-09-20 |
PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-20 day: 20 |
PublicationDecade | 2020 |
PublicationTitle | 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) |
PublicationTitleAbbrev | ITSC |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.9931651 |
Snippet | Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems.... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Artificial neural networks Autonomous vehicles Linear programming Optimization Reinforcement learning Safety Security |
Title | Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization |
URI | https://ieeexplore.ieee.org/document/9294262 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwMhECW1J09qWuN3OHh0tyylLBxNY1NNqo1tTW8NsIMatTXN7sVfL7BYo_HgbcOSQGDCDMO89xA6ZwaoAGuTrlDaXVCYToTwDLRZTxsXjzjD9qmB0S0fztjNvDdvoIsNFgYAQvEZpP4zvOUXK1P5VFnHuXJPoL6FtpyZ1VitCPrNiOxcTyd95kzMw6soSWPnH6opwWkMdtDoa7i6VuQlrUqdmo9fTIz_nc8uan_D8_B443j2UAOWLVRMlAV8D4EJ1YSkH47kqY_YNeHLqvQABnfTxw_wFKrhcFTpwWO19poqr9jrdwbVCChwTRmM79yh8hbRmm00G1xN-8MkSigkz5TLMjEauFY271qWSypMRqBnISu4oc4PGaV4btzPTBDNFdE5Iy5gsVT599ICRN7dR83lagkHCFsfTTCpmZaGFZmRXBOTc6BuCCUJPUQtv0KL95olYxEX5-jv5mO07XfJV15QcoKa5bqCU-feS30W9vUTBDWovw |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0QD3pSA8bf9uDRja503XY0RAIKSAQMN9J2X9Wow5Dt4l9v202MxoO3pWvSpW32vn793nsIXTAFNAatvXYspDmgMOnFsVWgDUKpTDxiNrZNDQxHvDdjN_NwXkOXay4MALjiM_Dto7vLT5eqsKmyloFyK6C-gTYN7rOwZGtVtN-AJK3-dNJhZpNZghUlftX9h2-Kg43uDhp-DVhWi7z4RS599fFLi_G_X7SLmt8EPTxeQ88eqkHWQOlEaMD34LRQlUv74Uo-9RGbJnxV5JbCYM76-AGeXD0crnx68FisrKvKK7YOns43AlJcigbjO_Nbeav4mk00615POz2vMlHwnilPck9J4FLoqK1ZlNBYBQRCDUHKFTVIpITgkTIvg5hILoiMGDEhi6bC3pimEEftfVTPlhkcIKxtPMESyWSiWBqohEuiIg7UDCESQg9Rw87Q4r3UyVhUk3P0d_M52upNh4PFoD-6PUbbdsVsHQYlJ6ierwo4NWCfyzO3xp8FLawM |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+23rd+International+Conference+on+Intelligent+Transportation+Systems+%28ITSC%29&rft.atitle=Safe+Reinforcement+Learning+for+Autonomous+Vehicles+through+Parallel+Constrained+Policy+Optimization&rft.au=Wen%2C+Lu&rft.au=Duan%2C+Jingliang&rft.au=Li%2C+Shengbo+Eben&rft.au=Xu%2C+Shaobing&rft.date=2020-09-20&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FITSC45102.2020.9294262&rft.externalDocID=9294262 |