Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cann...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) pp. 1 - 7
Main Authors	Wen, Lu, Duan, Jingliang, Li, Shengbo Eben, Xu, Shaobing, Peng, Huei
Format	Conference Proceeding
Language	English
Published	IEEE 20.09.2020
Subjects	Artificial neural networks Autonomous vehicles Linear programming Optimization Reinforcement learning Safety Security
Online Access	Get full text
DOI	10.1109/ITSC45102.2020.9294262

Cover

Abstract	Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning.
AbstractList	Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning.
Author	Duan, Jingliang Xu, Shaobing Peng, Huei Li, Shengbo Eben Wen, Lu
Author_xml	– sequence: 1 givenname: Lu surname: Wen fullname: Wen, Lu email: lulwen@umich.edu organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084 – sequence: 2 givenname: Jingliang surname: Duan fullname: Duan, Jingliang organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084 – sequence: 3 givenname: Shengbo Eben surname: Li fullname: Li, Shengbo Eben email: lisb04@gmail.com organization: Tsinghua University,School of Vehicle and Mobility,Beijing,China,100084 – sequence: 4 givenname: Shaobing surname: Xu fullname: Xu, Shaobing organization: University of Michigan,Department of Mechanical Engineering,Ann Arbor,MI,USA,48105 – sequence: 5 givenname: Huei surname: Peng fullname: Peng, Huei organization: University of Michigan,Department of Mechanical Engineering,Ann Arbor,MI,USA,48105
BookMark	eNotj11rwjAYRjPYLqbbLxiM_IG6JI1pcyllH0JBmW638jZ9YwNtIml64X79BL164Bw48MzIvQ8eCXnlbME502_r_a6SS87EQjDBFlpoKZS4IzNeiJJLLrV8JO0OLNJvdN6GaHBAn2iNEL3zR3pBdDWl4MMQppH-YudMjyNNXQzTsaNbiND32NMq-DFFcB5bug29M2e6OSU3uD9ILvgn8mChH_H5tnPy8_G-r76yevO5rlZ15oTSKTMNqgZskVtZaFEaznBpkbfKCKG1AVCFuUheskYBawrJcq2tgJxJ1WJZ5HPycu06RDycohsgng-34_k_z2lVaA
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ITSC45102.2020.9294262
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1728141494 9781728141497
EndPage	7
ExternalDocumentID	9294262
Genre	orig-research
GroupedDBID	6IE 6IH CBEJK RIE RIO
ID	FETCH-LOGICAL-i269t-cbe6baf73f47928c10e5fe1d6c2299caa67c73f180b6a0b740399f2a3046de873
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:36:59 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i269t-cbe6baf73f47928c10e5fe1d6c2299caa67c73f180b6a0b740399f2a3046de873
PageCount	7
ParticipantIDs	ieee_primary_9294262
PublicationCentury	2000
PublicationDate	2020-09-20
PublicationDateYYYYMMDD	2020-09-20
PublicationDate_xml	– month: 09 year: 2020 text: 2020-09-20 day: 20
PublicationDecade	2020
PublicationTitle	2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)
PublicationTitleAbbrev	ITSC
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.9931651
Snippet	Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems....
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Artificial neural networks Autonomous vehicles Linear programming Optimization Reinforcement learning Safety Security
Title	Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization
URI	https://ieeexplore.ieee.org/document/9294262
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwMhECW1J09qWuN3OHh0tyylLBxNY1NNqo1tTW8NsIMatTXN7sVfL7BYo_HgbcOSQGDCDMO89xA6ZwaoAGuTrlDaXVCYToTwDLRZTxsXjzjD9qmB0S0fztjNvDdvoIsNFgYAQvEZpP4zvOUXK1P5VFnHuXJPoL6FtpyZ1VitCPrNiOxcTyd95kzMw6soSWPnH6opwWkMdtDoa7i6VuQlrUqdmo9fTIz_nc8uan_D8_B443j2UAOWLVRMlAV8D4EJ1YSkH47kqY_YNeHLqvQABnfTxw_wFKrhcFTpwWO19poqr9jrdwbVCChwTRmM79yh8hbRmm00G1xN-8MkSigkz5TLMjEauFY271qWSypMRqBnISu4oc4PGaV4btzPTBDNFdE5Iy5gsVT599ICRN7dR83lagkHCFsfTTCpmZaGFZmRXBOTc6BuCCUJPUQtv0KL95olYxEX5-jv5mO07XfJV15QcoKa5bqCU-feS30W9vUTBDWovw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0QD3pSA8bf9uDRja503XY0RAIKSAQMN9J2X9Wow5Dt4l9v202MxoO3pWvSpW32vn793nsIXTAFNAatvXYspDmgMOnFsVWgDUKpTDxiNrZNDQxHvDdjN_NwXkOXay4MALjiM_Dto7vLT5eqsKmyloFyK6C-gTYN7rOwZGtVtN-AJK3-dNJhZpNZghUlftX9h2-Kg43uDhp-DVhWi7z4RS599fFLi_G_X7SLmt8EPTxeQ88eqkHWQOlEaMD34LRQlUv74Uo-9RGbJnxV5JbCYM76-AGeXD0crnx68FisrKvKK7YOns43AlJcigbjO_Nbeav4mk00615POz2vMlHwnilPck9J4FLoqK1ZlNBYBQRCDUHKFTVIpITgkTIvg5hILoiMGDEhi6bC3pimEEftfVTPlhkcIKxtPMESyWSiWBqohEuiIg7UDCESQg9Rw87Q4r3UyVhUk3P0d_M52upNh4PFoD-6PUbbdsVsHQYlJ6ierwo4NWCfyzO3xp8FLawM
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+23rd+International+Conference+on+Intelligent+Transportation+Systems+%28ITSC%29&rft.atitle=Safe+Reinforcement+Learning+for+Autonomous+Vehicles+through+Parallel+Constrained+Policy+Optimization&rft.au=Wen%2C+Lu&rft.au=Duan%2C+Jingliang&rft.au=Li%2C+Shengbo+Eben&rft.au=Xu%2C+Shaobing&rft.date=2020-09-20&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FITSC45102.2020.9294262&rft.externalDocID=9294262