Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and mode...
Saved in:
Published in | IEEE transactions on cybernetics Vol. 51; no. 1; pp. 174 - 187 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.01.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2168-2267 2168-2275 |
DOI: | 10.1109/TCYB.2020.3015811 |