Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and mode...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on cybernetics Vol. 51; no. 1; pp. 174 - 187
Main Authors	Wang, Xiaoqiang, Ke, Liangjun, Qiao, Zhimin, Chai, Xinghua
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Convergence Double estimators Games Learning Learning (artificial intelligence) Markov processes mean-field approximation multiagent reinforcement learning (MARL) Multiagent systems Nash equilibrium Simulators Traffic control Traffic flow traffic signal control (TSC) Traffic signals
Online Access	Get full text

Cover

Loading…

Abstract	Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
AbstractList	Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double [Formula Omitted]-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double [Formula Omitted]-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent [Formula Omitted]-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Author	Wang, Xiaoqiang Ke, Liangjun Chai, Xinghua Qiao, Zhimin
Author_xml	– sequence: 1 givenname: Xiaoqiang orcidid: 0000-0002-3783-1268 surname: Wang fullname: Wang, Xiaoqiang email: wangxq5127@stu.xjtu.edu.cn organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 2 givenname: Liangjun orcidid: 0000-0002-2920-0853 surname: Ke fullname: Ke, Liangjun email: keljxjtu@xjtu.edu.cn organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 3 givenname: Zhimin orcidid: 0000-0001-5829-4353 surname: Qiao fullname: Qiao, Zhimin email: qiao.miracle@gmail.com organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 4 givenname: Xinghua surname: Chai fullname: Chai, Xinghua email: cetc54008@yeah.net organization: CETC Key Laboratory of Aerospace Information Applications, Shijiazhuang, China
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/32881705$$D View this record in MEDLINE/PubMed
BookMark	eNp9kU1PGzEQhq0KxFf5ARUSWqkXLpv6M_YeaQS0UmglCAdOK693HBk5Nti7SP339SppDhzqy9ij553xzHuKDkIMgNAXgmeE4ObbavH8fUYxxTOGiVCEfEInlMxVTakUB_v7XB6j85xfcDmqpBp1hI4ZVYpILE7Qw1KnNdSPRnuoVklb60z16NZB-2oRw5Cir56yC-tKV7_iO_jqfvSD02sIQ_UALtiYDGym1xJ0CoX8jA6t9hnOd_EMPd3erBY_6uXvu5-L62VtGG-GupeCdnPJieCmwR1lFuuOGMp5b4ws8yglABugHGshyuextQBWs57TrhcdO0NX27qvKb6NkId247IB73WAOOa2VMJcYkabgn79gL7EMZUZJ0pSRiQhtFCXO2rsNtC3r8ltdPrT_ttWAcgWMCnmnMDuEYLbyZR2MqWdTGl3phSN_KAxbtCDm3arnf-v8mKrdACw79QQNWeUs7_MKZbv
CODEN	ITCEB8
CitedBy_id	crossref_primary_10_1109_ACCESS_2021_3059496 crossref_primary_10_1109_TITS_2022_3229477 crossref_primary_10_1109_ACCESS_2024_3410318 crossref_primary_10_3390_info13090408 crossref_primary_10_3390_electronics13010198 crossref_primary_10_1109_TII_2023_3296887 crossref_primary_10_1109_ACCESS_2023_3284316 crossref_primary_10_1038_s41598_023_36606_2 crossref_primary_10_3390_su16052160 crossref_primary_10_1016_j_ins_2024_120560 crossref_primary_10_1109_TCYB_2024_3385910 crossref_primary_10_3390_su14105751 crossref_primary_10_1016_j_ins_2023_119484 crossref_primary_10_1016_j_ins_2022_12_043 crossref_primary_10_1016_j_eswa_2023_121111 crossref_primary_10_1109_TCYB_2023_3266448 crossref_primary_10_3390_app131910803 crossref_primary_10_15622_ia_22_1_1 crossref_primary_10_3390_su14010107 crossref_primary_10_1109_TVT_2024_3444475 crossref_primary_10_1049_itr2_12328 crossref_primary_10_1049_itr2_12208 crossref_primary_10_1109_ACCESS_2025_3544961 crossref_primary_10_1109_TITS_2023_3344590 crossref_primary_10_1109_ACCESS_2023_3275883 crossref_primary_10_1049_itr2_12364 crossref_primary_10_1109_TCYB_2022_3179775 crossref_primary_10_1109_TNNLS_2023_3265358 crossref_primary_10_1109_TCDS_2023_3281878 crossref_primary_10_1002_cjce_24878 crossref_primary_10_1061_JTEPBS_TEENG_8376 crossref_primary_10_1016_j_engappai_2024_108100 crossref_primary_10_3390_computers11030038 crossref_primary_10_1016_j_swevo_2024_101588 crossref_primary_10_1109_JIOT_2024_3401829 crossref_primary_10_1139_cjce_2022_0273 crossref_primary_10_3390_s23052373 crossref_primary_10_1109_TCYB_2021_3107202 crossref_primary_10_1109_OJITS_2021_3126126 crossref_primary_10_1109_MITS_2022_3144797 crossref_primary_10_1109_ACCESS_2022_3214481 crossref_primary_10_1109_TITS_2024_3352446 crossref_primary_10_1007_s10489_023_04652_y crossref_primary_10_1007_s42421_024_00093_2 crossref_primary_10_1080_15472450_2023_2270428 crossref_primary_10_1007_s10489_022_03643_9 crossref_primary_10_1016_j_engappai_2022_105019 crossref_primary_10_1007_s10489_024_05933_w crossref_primary_10_1016_j_epsr_2023_110068 crossref_primary_10_4018_IJACI_323196 crossref_primary_10_1080_21680566_2024_2337216 crossref_primary_10_1007_s10489_022_03840_6 crossref_primary_10_1016_j_engappai_2025_110440 crossref_primary_10_1109_TITS_2022_3173490 crossref_primary_10_1016_j_engappai_2023_106033 crossref_primary_10_1007_s11801_024_3267_2 crossref_primary_10_1109_TCYB_2022_3223918 crossref_primary_10_12677_orf_2024_142143 crossref_primary_10_1007_s10489_021_02256_y crossref_primary_10_1109_TCYB_2024_3356981 crossref_primary_10_1109_TVT_2022_3176620 crossref_primary_10_1016_j_isci_2024_109751 crossref_primary_10_1155_2021_9954267 crossref_primary_10_1109_JAS_2024_124365 crossref_primary_10_3390_app122412783 crossref_primary_10_1109_TCYB_2021_3117705 crossref_primary_10_1109_ACCESS_2024_3395249 crossref_primary_10_1145_3695986 crossref_primary_10_48130_DTS_2023_0012 crossref_primary_10_1109_JIOT_2023_3342480 crossref_primary_10_1109_JIOT_2023_3284510 crossref_primary_10_1038_s41598_023_46074_3 crossref_primary_10_1016_j_eswa_2023_120535 crossref_primary_10_1016_j_geits_2023_100124 crossref_primary_10_1016_j_aej_2022_12_028 crossref_primary_10_1007_s13177_022_00315_3 crossref_primary_10_3390_electronics11030465 crossref_primary_10_1109_LWC_2022_3205503 crossref_primary_10_1007_s13177_024_00426_z crossref_primary_10_1016_j_patcog_2023_109917 crossref_primary_10_1016_j_knosys_2025_113022
Cites_doi	10.1080/15325000490195970 10.1049/iet-its.2009.0070 10.1287/mnsc.1050.0451 10.1145/3308558.3313433 10.1073/pnas.39.10.1953 10.1016/j.trc.2017.09.020 10.1109/TSMCA.2010.2052606 10.1109/TITS.2014.2347300 10.1038/nature14539 10.1162/089976699300016070 10.1145/3219819.3220096 10.1016/j.engappai.2011.04.011 10.1016/S0191-2615(03)00015-8 10.1109/ACC.2016.7525014 10.1109/TCYB.2019.2904742 10.1016/B978-1-55860-307-3.50049-6 10.1049/iet-its.2015.0108 10.1038/nature14236 10.1109/TITS.2019.2901791 10.1145/3068287 10.1109/TITS.2013.2255286 10.1016/j.trc.2013.08.014 10.1109/ITSC.2014.6958095 10.1023/A:1013689704352 10.1109/TITS.2010.2091408 10.1016/j.trpro.2015.09.070 10.1109/ITSC.2011.6083114 10.1016/B978-1-55860-335-6.50027-1 10.1109/TITS.2006.874716 10.1109/AMS.2014.16 10.1162/neco.1994.6.6.1185 10.1007/s10458-008-9046-9 10.1109/CVPR.2018.00493
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID	97E RIA RIE AAYXX CITATION NPM 7SC 7SP 7TB 8FD F28 FR3 H8D JQ2 L7M L~C L~D 7X8
DOI	10.1109/TCYB.2020.3015811
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Aerospace Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	PubMed MEDLINE - Academic Aerospace Database
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Sciences (General)
EISSN	2168-2275
EndPage	187
ExternalDocumentID	32881705 10_1109_TCYB_2020_3015811 9186324
Genre	orig-research Journal Article
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 61973244; 61573277 funderid: 10.13039/501100001809
GroupedDBID	0R~ 4.4 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AENEX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION RIG NPM 7SC 7SP 7TB 8FD F28 FR3 H8D JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-c349t-d752b674154c90b23f0ab1c244dcc7811885e0ce240a550080ffeefa3d42bd5b3
IEDL.DBID	RIE
ISSN	2168-2267 2168-2275
IngestDate	Fri Jul 11 03:51:29 EDT 2025 Mon Jun 30 02:32:31 EDT 2025 Thu Jan 02 22:58:50 EST 2025 Tue Jul 01 00:53:56 EDT 2025 Thu Apr 24 23:10:14 EDT 2025 Wed Aug 27 02:32:32 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c349t-d752b674154c90b23f0ab1c244dcc7811885e0ce240a550080ffeefa3d42bd5b3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0002-2920-0853 0000-0002-3783-1268 0000-0001-5829-4353
PMID	32881705
PQID	2472317112
PQPubID	85422
PageCount	14
ParticipantIDs	proquest_miscellaneous_2440470329 crossref_primary_10_1109_TCYB_2020_3015811 crossref_citationtrail_10_1109_TCYB_2020_3015811 pubmed_primary_32881705 proquest_journals_2472317112 ieee_primary_9186324
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-Jan. 2021-1-00 2021-Jan 20210101
PublicationDateYYYYMMDD	2021-01-01
PublicationDate_xml	– month: 01 year: 2021 text: 2021-Jan.
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: Piscataway
PublicationTitle	IEEE transactions on cybernetics
PublicationTitleAbbrev	TCYB
PublicationTitleAlternate	IEEE Trans Cybern
PublicationYear	2021
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref15 koonce (ref4) 2008 ref11 codeca (ref43) 2018 van hasselt (ref28) 2016 ref19 brockman (ref45) 2016 shamshirband (ref18) 2012; 9 ref46 lillicrap (ref14) 2015 ref48 ref42 ref41 lowe (ref34) 2017 ref44 ref49 vezhnevets (ref47) 2017; 70 ref8 ref7 kuyer (ref22) 2008 ref3 ref6 ref5 stanley (ref26) 1971 ref40 ref35 ref37 casas (ref16) 2017 claus (ref17) 1998 ref36 ref31 ref30 ref33 ref32 hasselt (ref24) 2010 lecun (ref12) 2015; 521 ref2 ref1 ref39 ref38 wiering (ref10) 2004 sutton (ref9) 2018 ref23 ref25 ref20 yang (ref27) 2018 ref21 ref29
References_xml	– ident: ref2 doi: 10.1080/15325000490195970 – ident: ref19 doi: 10.1049/iet-its.2009.0070 – ident: ref29 doi: 10.1287/mnsc.1050.0451 – year: 1971 ident: ref26 publication-title: Phase Transitions and Critical Phenomena – ident: ref35 doi: 10.1145/3308558.3313433 – ident: ref30 doi: 10.1073/pnas.39.10.1953 – ident: ref38 doi: 10.1016/j.trc.2017.09.020 – ident: ref7 doi: 10.1109/TSMCA.2010.2052606 – ident: ref49 doi: 10.1109/TITS.2014.2347300 – year: 2018 ident: ref27 publication-title: Mean field multi-agent reinforcement learning – volume: 521 start-page: 436 year: 2015 ident: ref12 article-title: Deep learning publication-title: Nature doi: 10.1038/nature14539 – ident: ref37 doi: 10.1162/089976699300016070 – start-page: 2613 year: 2010 ident: ref24 article-title: Double Q-learning publication-title: Proc Adv Neural Inf Process Syst – ident: ref15 doi: 10.1145/3219819.3220096 – ident: ref6 doi: 10.1016/j.engappai.2011.04.011 – ident: ref5 doi: 10.1016/S0191-2615(03)00015-8 – ident: ref42 doi: 10.1109/ACC.2016.7525014 – ident: ref41 doi: 10.1109/TCYB.2019.2904742 – year: 2018 ident: ref9 publication-title: Reinforcement Learning An Introduction – ident: ref21 doi: 10.1016/B978-1-55860-307-3.50049-6 – year: 2016 ident: ref45 publication-title: OpenAI Gym – start-page: 6379 year: 2017 ident: ref34 article-title: Multi-agent actor-critic for mixed cooperative-competitive environments publication-title: Proc Adv Neural Inf Process Syst – start-page: 222 year: 2016 ident: ref28 article-title: Deep reinforcement learning with double Q-learning publication-title: Proc 13th AAAI Conf Artif Intell – start-page: 43 year: 2018 ident: ref43 article-title: Monaco sumo traffic (most) scenario: A 3D mobility scenario for cooperative its publication-title: Proc SUMO User Conf Simulating Auton Intermodal Transp Syst – ident: ref3 doi: 10.1049/iet-its.2015.0108 – ident: ref13 doi: 10.1038/nature14236 – year: 2017 ident: ref16 publication-title: Deep deterministic policy gradient for urban traffic light control – volume: 70 start-page: 3540 year: 2017 ident: ref47 article-title: Feudal networks for hierarchical reinforcement learning publication-title: Proc 34th Int Conf Mach Learn – year: 2004 ident: ref10 article-title: Intelligent traffic light control – ident: ref23 doi: 10.1109/TITS.2019.2901791 – ident: ref1 doi: 10.1145/3068287 – volume: 9 start-page: 148 year: 2012 ident: ref18 article-title: A distributed approach for coordination between traffic lights based on game theory publication-title: Int Arab J Inf Technol – ident: ref39 doi: 10.1109/TITS.2013.2255286 – ident: ref48 doi: 10.1016/j.trc.2013.08.014 – year: 2008 ident: ref4 article-title: Traffic signal timing manual – start-page: 746 year: 1998 ident: ref17 article-title: The dynamics of reinforcement learning in cooperative multiagent systems publication-title: Proc 15th Proc Nat /10th Conf Artif Intell /Innov Appl Artif Intell – ident: ref32 doi: 10.1109/ITSC.2014.6958095 – start-page: 656 year: 2008 ident: ref22 article-title: Multiagent reinforcement learning for urban traffic control using coordination graphs publication-title: Machine Learning and Knowledge Discovery in Databases – ident: ref25 doi: 10.1023/A:1013689704352 – ident: ref11 doi: 10.1109/TITS.2010.2091408 – ident: ref40 doi: 10.1016/j.trpro.2015.09.070 – ident: ref20 doi: 10.1109/ITSC.2011.6083114 – ident: ref31 doi: 10.1016/B978-1-55860-335-6.50027-1 – year: 2015 ident: ref14 publication-title: Continuous control with deep reinforcement learning – ident: ref8 doi: 10.1109/TITS.2006.874716 – ident: ref44 doi: 10.1109/AMS.2014.16 – ident: ref36 doi: 10.1162/neco.1994.6.6.1185 – ident: ref33 doi: 10.1007/s10458-008-9046-9 – ident: ref46 doi: 10.1109/CVPR.2018.00493
SSID	ssj0000816898
Score	2.602133
Snippet	Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	174
SubjectTerms	Algorithms Convergence Double estimators Games Learning Learning (artificial intelligence) Markov processes mean-field approximation multiagent reinforcement learning (MARL) Multiagent systems Nash equilibrium Simulators Traffic control Traffic flow traffic signal control (TSC) Traffic signals
Title	Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning
URI	https://ieeexplore.ieee.org/document/9186324 https://www.ncbi.nlm.nih.gov/pubmed/32881705 https://www.proquest.com/docview/2472317112 https://www.proquest.com/docview/2440470329
Volume	51
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PXEB-oKUUrkSh4Lw1rGdjXOkq1YVoj30IZVTFL8qRLVb0V0O_HpmHG8kUIu4WYqdh2ccfzOe-QbgnbNGaC8j92PTcG1Cx804NhzNW-WUsj4mh_7Z-fj0Wn--qW5W4OOQCxNCSMFnYUTNdJbvZ25BrrLDpjTELr4Kq2i49blagz8lFZBIpW8lNjiiijofYpaiObyafD1CY1CijYr7nympQIyShtjpqj92pFRi5Wm0mXadkxdwtnzfPtjk-2gxtyP36y8qx__9oJfwPMNP9qnXl3VYCdMNWM8L_IEdZBbq95tw8YVixPklyjAw3NGIaoJdfrul4ZM-vp2leAPWsfPZz3DHUi5vR6la7CIkQlaXfI8sc7jebsH1yfHV5JTnAgzcKd3Mua8raceEObRrhJUqis6WDhGBd45SVI2pgnABUUGHlg6CzxhDiJ3yWlpfWbUNa9PZNLwG5r1GS7IydfRRRyoiqKKNvo5ROl1KUYBYCqF1mZ2cimTctclKEU1LImxJhG0WYQEfhiH3PTXHvzpv0vQPHfPMF7C7lHSbF-9DK3WNqLdGJFrA_nAZlx2dpXTTMFtQHy00_i1lU8CrXkOGey8Va-fxZ76BZ5ICY5IfZxfW5j8W4S0im7ndSyr9G1et8BY
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1fb9MwED-NIcFegDEGYQOMBBIguXNtp3EeeIDC1LGuD1snjacs_jchphbRFjQ-C1-F78bZcSOBgLdJvEWKnSi-i-9357vfATwxWjFpuae2p0oqlaup6vmSonsrjBDa-hjQPxj1Bsfy3Ul-sgLf21oY51xMPnOdcBnP8u3ULEKobKfsqsAunlIo993FV3TQZi_33qA0n3K--3bcH9DUQ4AaIcs5tUXOdS-YTWlKprnwrNZdg0bNGhOqLJXKHTMODVuNYB3xk_fO-VpYybXNtcDnXoGriDNy3lSHtRGc2LIiNtvleEERxxTp2LTLyp1x__1rdD85esVocfFFa3BNcBX48PJfbGBs6vJ3fBvt3O5N-LFcoSa95WNnMdcd8-038sj_dQlvwY0EsMmr5o9YhxU3uQ3raQubkWeJZ_v5BhwOQxY8PUItdQRtdiDTIEcfzsL0fpPBT2JGBanJaPrFnZNYrVyHYjRy6CLlrInRVZJYas_uwPGlfNsmrE6mE3cPiLUSfeVcFd566UObROG1t4X33MguZxmwpdArk_jXQxuQ8yr6YaysgspUQWWqpDIZvGinfGrIR_41eCOIux2YJJ3B9lKzqrQ9zSouC8T1BWLtDB63t3FjCadF9cRNF2GMZBLtAS8zuNtoZPvspSLf__M7H8H1wfhgWA33RvtbsMZDGlCMWm3D6vzzwj1AHDfXD-PvROD0spXvJwkbTY8
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Traffic+Signal+Control+Using+a+Novel+Multiagent+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Wang%2C+Xiaoqiang&rft.au=Ke%2C+Liangjun&rft.au=Qiao%2C+Zhimin&rft.au=Chai%2C+Xinghua&rft.date=2021-01-01&rft.issn=2168-2275&rft.eissn=2168-2275&rft.volume=51&rft.issue=1&rft.spage=174&rft_id=info:doi/10.1109%2FTCYB.2020.3015811&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon