Adaptive Decision-Making for Automated Vehicles Under Roundabout Scenarios Using Optimization Embedded Reinforcement Learning

The roundabout is a typical changeable, interactive scenario in which automated vehicles should make adaptive and safe decisions. In this article, an optimization embedded reinforcement learning (OERL) is proposed to achieve adaptive decision-making under the roundabout. The promotion is the modifie...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 32; no. 12; pp. 5526 - 5538
Main Authors Zhang, Yuxiang, Gao, Bingzhao, Guo, Lulu, Guo, Hongyan, Chen, Hong
Format Journal Article
LanguageEnglish
Published United States IEEE 01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The roundabout is a typical changeable, interactive scenario in which automated vehicles should make adaptive and safe decisions. In this article, an optimization embedded reinforcement learning (OERL) is proposed to achieve adaptive decision-making under the roundabout. The promotion is the modified actor of the Actor-Critic framework, which embeds the model-based optimization method in reinforcement learning to explore continuous behaviors in action space directly. Therefore, the proposed method can determine the macroscale behavior (change lane or not) and medium-scale behaviors of desired acceleration and action time simultaneously with high sample efficiency. When scenarios change, medium-scale behaviors can be adjusted timely by the embedded direct search method, promoting the adaptability of decision-making. More notably, the modified actor matches human drivers' behaviors, macroscale behavior captures the human mind's jump, and medium-scale behaviors are preferentially adjusted through driving skills. To enable the agent adapts to different types of the roundabout, task representation is designed to restructure the policy network. In experiments, the algorithm efficiency and the learned driving strategy are compared with decision-making containing macroscale behavior and constant medium-scale behaviors of the desired acceleration and action time. To investigate the adaptability, the performance under an untrained type of roundabout and two more dangerous situations are simulated to verify that the proposed method changes the decisions with changeable scenarios accordingly. The results show that the proposed method has high algorithm efficiency and better system performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2020.3042981