Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and mode...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on cybernetics Vol. 51; no. 1; pp. 174 - 187
Main Authors	Wang, Xiaoqiang, Ke, Liangjun, Qiao, Zhimin, Chai, Xinghua
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Convergence Double estimators Games Learning Learning (artificial intelligence) Markov processes mean-field approximation multiagent reinforcement learning (MARL) Multiagent systems Nash equilibrium Simulators Traffic control Traffic flow traffic signal control (TSC) Traffic signals
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2020.3015811