Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cann...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) pp. 1 - 7
Main Authors	Wen, Lu, Duan, Jingliang, Li, Shengbo Eben, Xu, Shaobing, Peng, Huei
Format	Conference Proceeding
Language	English
Published	IEEE 20.09.2020
Subjects	Artificial neural networks Autonomous vehicles Linear programming Optimization Reinforcement learning Safety Security
Online Access	Get full text
DOI	10.1109/ITSC45102.2020.9294262

Cover

Loading…

More Information
Summary:	Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two predominant problems: behaviors are unexplainable, and they cannot guarantee safety under new scenarios. This paper presents a safe RL algorithm, called Parallel Constrained Policy Optimization (PCPO), for two autonomous driving tasks. PCPO extends today's common actor-critic architecture to a three-component learning framework, in which three neural networks are used to approximate the policy function, value function and a newly added risk function, respectively. Meanwhile, a trust region constraint is added to allow large update steps without breaking the monotonic improvement condition. To ensure the feasibility of safety constrained problems, synchronized parallel learners are employed to explore different state spaces, which accelerates learning and policy-update. The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning.
DOI:	10.1109/ITSC45102.2020.9294262