Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and mode...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on cybernetics Vol. 51; no. 1; pp. 174 - 187
Main Authors Wang, Xiaoqiang, Ke, Liangjun, Qiao, Zhimin, Chai, Xinghua
Format Journal Article
LanguageEnglish
Published United States IEEE 01.01.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
AbstractList Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double [Formula Omitted]-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double [Formula Omitted]-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent [Formula Omitted]-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.
Author Wang, Xiaoqiang
Ke, Liangjun
Chai, Xinghua
Qiao, Zhimin
Author_xml – sequence: 1
  givenname: Xiaoqiang
  orcidid: 0000-0002-3783-1268
  surname: Wang
  fullname: Wang, Xiaoqiang
  email: wangxq5127@stu.xjtu.edu.cn
  organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China
– sequence: 2
  givenname: Liangjun
  orcidid: 0000-0002-2920-0853
  surname: Ke
  fullname: Ke, Liangjun
  email: keljxjtu@xjtu.edu.cn
  organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China
– sequence: 3
  givenname: Zhimin
  orcidid: 0000-0001-5829-4353
  surname: Qiao
  fullname: Qiao, Zhimin
  email: qiao.miracle@gmail.com
  organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China
– sequence: 4
  givenname: Xinghua
  surname: Chai
  fullname: Chai, Xinghua
  email: cetc54008@yeah.net
  organization: CETC Key Laboratory of Aerospace Information Applications, Shijiazhuang, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/32881705$$D View this record in MEDLINE/PubMed
BookMark eNp9kU1PGzEQhq0KxFf5ARUSWqkXLpv6M_YeaQS0UmglCAdOK693HBk5Nti7SP339SppDhzqy9ij553xzHuKDkIMgNAXgmeE4ObbavH8fUYxxTOGiVCEfEInlMxVTakUB_v7XB6j85xfcDmqpBp1hI4ZVYpILE7Qw1KnNdSPRnuoVklb60z16NZB-2oRw5Cir56yC-tKV7_iO_jqfvSD02sIQ_UALtiYDGym1xJ0CoX8jA6t9hnOd_EMPd3erBY_6uXvu5-L62VtGG-GupeCdnPJieCmwR1lFuuOGMp5b4ws8yglABugHGshyuextQBWs57TrhcdO0NX27qvKb6NkId247IB73WAOOa2VMJcYkabgn79gL7EMZUZJ0pSRiQhtFCXO2rsNtC3r8ltdPrT_ttWAcgWMCnmnMDuEYLbyZR2MqWdTGl3phSN_KAxbtCDm3arnf-v8mKrdACw79QQNWeUs7_MKZbv
CODEN ITCEB8
CitedBy_id crossref_primary_10_1109_ACCESS_2021_3059496
crossref_primary_10_1109_TITS_2022_3229477
crossref_primary_10_1109_ACCESS_2024_3410318
crossref_primary_10_3390_info13090408
crossref_primary_10_3390_electronics13010198
crossref_primary_10_1109_TII_2023_3296887
crossref_primary_10_1109_ACCESS_2023_3284316
crossref_primary_10_1038_s41598_023_36606_2
crossref_primary_10_3390_su16052160
crossref_primary_10_1016_j_ins_2024_120560
crossref_primary_10_1109_TCYB_2024_3385910
crossref_primary_10_3390_su14105751
crossref_primary_10_1016_j_ins_2023_119484
crossref_primary_10_1016_j_ins_2022_12_043
crossref_primary_10_1016_j_eswa_2023_121111
crossref_primary_10_1109_TCYB_2023_3266448
crossref_primary_10_3390_app131910803
crossref_primary_10_15622_ia_22_1_1
crossref_primary_10_3390_su14010107
crossref_primary_10_1109_TVT_2024_3444475
crossref_primary_10_1049_itr2_12328
crossref_primary_10_1049_itr2_12208
crossref_primary_10_1109_ACCESS_2025_3544961
crossref_primary_10_1109_TITS_2023_3344590
crossref_primary_10_1109_ACCESS_2023_3275883
crossref_primary_10_1049_itr2_12364
crossref_primary_10_1109_TCYB_2022_3179775
crossref_primary_10_1109_TNNLS_2023_3265358
crossref_primary_10_1109_TCDS_2023_3281878
crossref_primary_10_1002_cjce_24878
crossref_primary_10_1061_JTEPBS_TEENG_8376
crossref_primary_10_1016_j_engappai_2024_108100
crossref_primary_10_3390_computers11030038
crossref_primary_10_1016_j_swevo_2024_101588
crossref_primary_10_1109_JIOT_2024_3401829
crossref_primary_10_1139_cjce_2022_0273
crossref_primary_10_3390_s23052373
crossref_primary_10_1109_TCYB_2021_3107202
crossref_primary_10_1109_OJITS_2021_3126126
crossref_primary_10_1109_MITS_2022_3144797
crossref_primary_10_1109_ACCESS_2022_3214481
crossref_primary_10_1109_TITS_2024_3352446
crossref_primary_10_1007_s10489_023_04652_y
crossref_primary_10_1007_s42421_024_00093_2
crossref_primary_10_1080_15472450_2023_2270428
crossref_primary_10_1007_s10489_022_03643_9
crossref_primary_10_1016_j_engappai_2022_105019
crossref_primary_10_1007_s10489_024_05933_w
crossref_primary_10_1016_j_epsr_2023_110068
crossref_primary_10_4018_IJACI_323196
crossref_primary_10_1080_21680566_2024_2337216
crossref_primary_10_1007_s10489_022_03840_6
crossref_primary_10_1016_j_engappai_2025_110440
crossref_primary_10_1109_TITS_2022_3173490
crossref_primary_10_1016_j_engappai_2023_106033
crossref_primary_10_1007_s11801_024_3267_2
crossref_primary_10_1109_TCYB_2022_3223918
crossref_primary_10_12677_orf_2024_142143
crossref_primary_10_1007_s10489_021_02256_y
crossref_primary_10_1109_TCYB_2024_3356981
crossref_primary_10_1109_TVT_2022_3176620
crossref_primary_10_1016_j_isci_2024_109751
crossref_primary_10_1155_2021_9954267
crossref_primary_10_1109_JAS_2024_124365
crossref_primary_10_3390_app122412783
crossref_primary_10_1109_TCYB_2021_3117705
crossref_primary_10_1109_ACCESS_2024_3395249
crossref_primary_10_1145_3695986
crossref_primary_10_48130_DTS_2023_0012
crossref_primary_10_1109_JIOT_2023_3342480
crossref_primary_10_1109_JIOT_2023_3284510
crossref_primary_10_1038_s41598_023_46074_3
crossref_primary_10_1016_j_eswa_2023_120535
crossref_primary_10_1016_j_geits_2023_100124
crossref_primary_10_1016_j_aej_2022_12_028
crossref_primary_10_1007_s13177_022_00315_3
crossref_primary_10_3390_electronics11030465
crossref_primary_10_1109_LWC_2022_3205503
crossref_primary_10_1007_s13177_024_00426_z
crossref_primary_10_1016_j_patcog_2023_109917
crossref_primary_10_1016_j_knosys_2025_113022
Cites_doi 10.1080/15325000490195970
10.1049/iet-its.2009.0070
10.1287/mnsc.1050.0451
10.1145/3308558.3313433
10.1073/pnas.39.10.1953
10.1016/j.trc.2017.09.020
10.1109/TSMCA.2010.2052606
10.1109/TITS.2014.2347300
10.1038/nature14539
10.1162/089976699300016070
10.1145/3219819.3220096
10.1016/j.engappai.2011.04.011
10.1016/S0191-2615(03)00015-8
10.1109/ACC.2016.7525014
10.1109/TCYB.2019.2904742
10.1016/B978-1-55860-307-3.50049-6
10.1049/iet-its.2015.0108
10.1038/nature14236
10.1109/TITS.2019.2901791
10.1145/3068287
10.1109/TITS.2013.2255286
10.1016/j.trc.2013.08.014
10.1109/ITSC.2014.6958095
10.1023/A:1013689704352
10.1109/TITS.2010.2091408
10.1016/j.trpro.2015.09.070
10.1109/ITSC.2011.6083114
10.1016/B978-1-55860-335-6.50027-1
10.1109/TITS.2006.874716
10.1109/AMS.2014.16
10.1162/neco.1994.6.6.1185
10.1007/s10458-008-9046-9
10.1109/CVPR.2018.00493
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
7TB
8FD
F28
FR3
H8D
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TCYB.2020.3015811
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Aerospace Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic

Aerospace Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2168-2275
EndPage 187
ExternalDocumentID 32881705
10_1109_TCYB_2020_3015811
9186324
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61973244; 61573277
  funderid: 10.13039/501100001809
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
AENEX
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
RIG
NPM
7SC
7SP
7TB
8FD
F28
FR3
H8D
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c349t-d752b674154c90b23f0ab1c244dcc7811885e0ce240a550080ffeefa3d42bd5b3
IEDL.DBID RIE
ISSN 2168-2267
2168-2275
IngestDate Fri Jul 11 03:51:29 EDT 2025
Mon Jun 30 02:32:31 EDT 2025
Thu Jan 02 22:58:50 EST 2025
Tue Jul 01 00:53:56 EDT 2025
Thu Apr 24 23:10:14 EDT 2025
Wed Aug 27 02:32:32 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c349t-d752b674154c90b23f0ab1c244dcc7811885e0ce240a550080ffeefa3d42bd5b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-2920-0853
0000-0002-3783-1268
0000-0001-5829-4353
PMID 32881705
PQID 2472317112
PQPubID 85422
PageCount 14
ParticipantIDs proquest_miscellaneous_2440470329
crossref_primary_10_1109_TCYB_2020_3015811
crossref_citationtrail_10_1109_TCYB_2020_3015811
pubmed_primary_32881705
proquest_journals_2472317112
ieee_primary_9186324
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-Jan.
2021-1-00
2021-Jan
20210101
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-Jan.
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transactions on cybernetics
PublicationTitleAbbrev TCYB
PublicationTitleAlternate IEEE Trans Cybern
PublicationYear 2021
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref15
koonce (ref4) 2008
ref11
codeca (ref43) 2018
van hasselt (ref28) 2016
ref19
brockman (ref45) 2016
shamshirband (ref18) 2012; 9
ref46
lillicrap (ref14) 2015
ref48
ref42
ref41
lowe (ref34) 2017
ref44
ref49
vezhnevets (ref47) 2017; 70
ref8
ref7
kuyer (ref22) 2008
ref3
ref6
ref5
stanley (ref26) 1971
ref40
ref35
ref37
casas (ref16) 2017
claus (ref17) 1998
ref36
ref31
ref30
ref33
ref32
hasselt (ref24) 2010
lecun (ref12) 2015; 521
ref2
ref1
ref39
ref38
wiering (ref10) 2004
sutton (ref9) 2018
ref23
ref25
ref20
yang (ref27) 2018
ref21
ref29
References_xml – ident: ref2
  doi: 10.1080/15325000490195970
– ident: ref19
  doi: 10.1049/iet-its.2009.0070
– ident: ref29
  doi: 10.1287/mnsc.1050.0451
– year: 1971
  ident: ref26
  publication-title: Phase Transitions and Critical Phenomena
– ident: ref35
  doi: 10.1145/3308558.3313433
– ident: ref30
  doi: 10.1073/pnas.39.10.1953
– ident: ref38
  doi: 10.1016/j.trc.2017.09.020
– ident: ref7
  doi: 10.1109/TSMCA.2010.2052606
– ident: ref49
  doi: 10.1109/TITS.2014.2347300
– year: 2018
  ident: ref27
  publication-title: Mean field multi-agent reinforcement learning
– volume: 521
  start-page: 436
  year: 2015
  ident: ref12
  article-title: Deep learning
  publication-title: Nature
  doi: 10.1038/nature14539
– ident: ref37
  doi: 10.1162/089976699300016070
– start-page: 2613
  year: 2010
  ident: ref24
  article-title: Double Q-learning
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref15
  doi: 10.1145/3219819.3220096
– ident: ref6
  doi: 10.1016/j.engappai.2011.04.011
– ident: ref5
  doi: 10.1016/S0191-2615(03)00015-8
– ident: ref42
  doi: 10.1109/ACC.2016.7525014
– ident: ref41
  doi: 10.1109/TCYB.2019.2904742
– year: 2018
  ident: ref9
  publication-title: Reinforcement Learning An Introduction
– ident: ref21
  doi: 10.1016/B978-1-55860-307-3.50049-6
– year: 2016
  ident: ref45
  publication-title: OpenAI Gym
– start-page: 6379
  year: 2017
  ident: ref34
  article-title: Multi-agent actor-critic for mixed cooperative-competitive environments
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 222
  year: 2016
  ident: ref28
  article-title: Deep reinforcement learning with double Q-learning
  publication-title: Proc 13th AAAI Conf Artif Intell
– start-page: 43
  year: 2018
  ident: ref43
  article-title: Monaco sumo traffic (most) scenario: A 3D mobility scenario for cooperative its
  publication-title: Proc SUMO User Conf Simulating Auton Intermodal Transp Syst
– ident: ref3
  doi: 10.1049/iet-its.2015.0108
– ident: ref13
  doi: 10.1038/nature14236
– year: 2017
  ident: ref16
  publication-title: Deep deterministic policy gradient for urban traffic light control
– volume: 70
  start-page: 3540
  year: 2017
  ident: ref47
  article-title: Feudal networks for hierarchical reinforcement learning
  publication-title: Proc 34th Int Conf Mach Learn
– year: 2004
  ident: ref10
  article-title: Intelligent traffic light control
– ident: ref23
  doi: 10.1109/TITS.2019.2901791
– ident: ref1
  doi: 10.1145/3068287
– volume: 9
  start-page: 148
  year: 2012
  ident: ref18
  article-title: A distributed approach for coordination between traffic lights based on game theory
  publication-title: Int Arab J Inf Technol
– ident: ref39
  doi: 10.1109/TITS.2013.2255286
– ident: ref48
  doi: 10.1016/j.trc.2013.08.014
– year: 2008
  ident: ref4
  article-title: Traffic signal timing manual
– start-page: 746
  year: 1998
  ident: ref17
  article-title: The dynamics of reinforcement learning in cooperative multiagent systems
  publication-title: Proc 15th Proc Nat /10th Conf Artif Intell /Innov Appl Artif Intell
– ident: ref32
  doi: 10.1109/ITSC.2014.6958095
– start-page: 656
  year: 2008
  ident: ref22
  article-title: Multiagent reinforcement learning for urban traffic control using coordination graphs
  publication-title: Machine Learning and Knowledge Discovery in Databases
– ident: ref25
  doi: 10.1023/A:1013689704352
– ident: ref11
  doi: 10.1109/TITS.2010.2091408
– ident: ref40
  doi: 10.1016/j.trpro.2015.09.070
– ident: ref20
  doi: 10.1109/ITSC.2011.6083114
– ident: ref31
  doi: 10.1016/B978-1-55860-335-6.50027-1
– year: 2015
  ident: ref14
  publication-title: Continuous control with deep reinforcement learning
– ident: ref8
  doi: 10.1109/TITS.2006.874716
– ident: ref44
  doi: 10.1109/AMS.2014.16
– ident: ref36
  doi: 10.1162/neco.1994.6.6.1185
– ident: ref33
  doi: 10.1007/s10458-008-9046-9
– ident: ref46
  doi: 10.1109/CVPR.2018.00493
SSID ssj0000816898
Score 2.602133
Snippet Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 174
SubjectTerms Algorithms
Convergence
Double estimators
Games
Learning
Learning (artificial intelligence)
Markov processes
mean-field approximation
multiagent reinforcement learning (MARL)
Multiagent systems
Nash equilibrium
Simulators
Traffic control
Traffic flow
traffic signal control (TSC)
Traffic signals
Title Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning
URI https://ieeexplore.ieee.org/document/9186324
https://www.ncbi.nlm.nih.gov/pubmed/32881705
https://www.proquest.com/docview/2472317112
https://www.proquest.com/docview/2440470329
Volume 51
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PXEB-oKUUrkSh4Lw1rGdjXOkq1YVoj30IZVTFL8qRLVb0V0O_HpmHG8kUIu4WYqdh2ccfzOe-QbgnbNGaC8j92PTcG1Cx804NhzNW-WUsj4mh_7Z-fj0Wn--qW5W4OOQCxNCSMFnYUTNdJbvZ25BrrLDpjTELr4Kq2i49blagz8lFZBIpW8lNjiiijofYpaiObyafD1CY1CijYr7nympQIyShtjpqj92pFRi5Wm0mXadkxdwtnzfPtjk-2gxtyP36y8qx__9oJfwPMNP9qnXl3VYCdMNWM8L_IEdZBbq95tw8YVixPklyjAw3NGIaoJdfrul4ZM-vp2leAPWsfPZz3DHUi5vR6la7CIkQlaXfI8sc7jebsH1yfHV5JTnAgzcKd3Mua8raceEObRrhJUqis6WDhGBd45SVI2pgnABUUGHlg6CzxhDiJ3yWlpfWbUNa9PZNLwG5r1GS7IydfRRRyoiqKKNvo5ROl1KUYBYCqF1mZ2cimTctclKEU1LImxJhG0WYQEfhiH3PTXHvzpv0vQPHfPMF7C7lHSbF-9DK3WNqLdGJFrA_nAZlx2dpXTTMFtQHy00_i1lU8CrXkOGey8Va-fxZ76BZ5ICY5IfZxfW5j8W4S0im7ndSyr9G1et8BY
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1fb9MwED-NIcFegDEGYQOMBBIguXNtp3EeeIDC1LGuD1snjacs_jchphbRFjQ-C1-F78bZcSOBgLdJvEWKnSi-i-9357vfATwxWjFpuae2p0oqlaup6vmSonsrjBDa-hjQPxj1Bsfy3Ul-sgLf21oY51xMPnOdcBnP8u3ULEKobKfsqsAunlIo993FV3TQZi_33qA0n3K--3bcH9DUQ4AaIcs5tUXOdS-YTWlKprnwrNZdg0bNGhOqLJXKHTMODVuNYB3xk_fO-VpYybXNtcDnXoGriDNy3lSHtRGc2LIiNtvleEERxxTp2LTLyp1x__1rdD85esVocfFFa3BNcBX48PJfbGBs6vJ3fBvt3O5N-LFcoSa95WNnMdcd8-038sj_dQlvwY0EsMmr5o9YhxU3uQ3raQubkWeJZ_v5BhwOQxY8PUItdQRtdiDTIEcfzsL0fpPBT2JGBanJaPrFnZNYrVyHYjRy6CLlrInRVZJYas_uwPGlfNsmrE6mE3cPiLUSfeVcFd566UObROG1t4X33MguZxmwpdArk_jXQxuQ8yr6YaysgspUQWWqpDIZvGinfGrIR_41eCOIux2YJJ3B9lKzqrQ9zSouC8T1BWLtDB63t3FjCadF9cRNF2GMZBLtAS8zuNtoZPvspSLf__M7H8H1wfhgWA33RvtbsMZDGlCMWm3D6vzzwj1AHDfXD-PvROD0spXvJwkbTY8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Traffic+Signal+Control+Using+a+Novel+Multiagent+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Wang%2C+Xiaoqiang&rft.au=Ke%2C+Liangjun&rft.au=Qiao%2C+Zhimin&rft.au=Chai%2C+Xinghua&rft.date=2021-01-01&rft.issn=2168-2275&rft.eissn=2168-2275&rft.volume=51&rft.issue=1&rft.spage=174&rft_id=info:doi/10.1109%2FTCYB.2020.3015811&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon