Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning
Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and mode...
Saved in:
Published in | IEEE transactions on cybernetics Vol. 51; no. 1; pp. 174 - 187 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.01.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. |
---|---|
AbstractList | Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q -learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q -learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q -learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double [Formula Omitted]-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double [Formula Omitted]-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent [Formula Omitted]-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics. |
Author | Wang, Xiaoqiang Ke, Liangjun Chai, Xinghua Qiao, Zhimin |
Author_xml | – sequence: 1 givenname: Xiaoqiang orcidid: 0000-0002-3783-1268 surname: Wang fullname: Wang, Xiaoqiang email: wangxq5127@stu.xjtu.edu.cn organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 2 givenname: Liangjun orcidid: 0000-0002-2920-0853 surname: Ke fullname: Ke, Liangjun email: keljxjtu@xjtu.edu.cn organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 3 givenname: Zhimin orcidid: 0000-0001-5829-4353 surname: Qiao fullname: Qiao, Zhimin email: qiao.miracle@gmail.com organization: State Key Laboratory for Manufacturing Systems Engineering, School of Automation Science and Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 4 givenname: Xinghua surname: Chai fullname: Chai, Xinghua email: cetc54008@yeah.net organization: CETC Key Laboratory of Aerospace Information Applications, Shijiazhuang, China |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/32881705$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kU1PGzEQhq0KxFf5ARUSWqkXLpv6M_YeaQS0UmglCAdOK693HBk5Nti7SP339SppDhzqy9ij553xzHuKDkIMgNAXgmeE4ObbavH8fUYxxTOGiVCEfEInlMxVTakUB_v7XB6j85xfcDmqpBp1hI4ZVYpILE7Qw1KnNdSPRnuoVklb60z16NZB-2oRw5Cir56yC-tKV7_iO_jqfvSD02sIQ_UALtiYDGym1xJ0CoX8jA6t9hnOd_EMPd3erBY_6uXvu5-L62VtGG-GupeCdnPJieCmwR1lFuuOGMp5b4ws8yglABugHGshyuextQBWs57TrhcdO0NX27qvKb6NkId247IB73WAOOa2VMJcYkabgn79gL7EMZUZJ0pSRiQhtFCXO2rsNtC3r8ltdPrT_ttWAcgWMCnmnMDuEYLbyZR2MqWdTGl3phSN_KAxbtCDm3arnf-v8mKrdACw79QQNWeUs7_MKZbv |
CODEN | ITCEB8 |
CitedBy_id | crossref_primary_10_1109_ACCESS_2021_3059496 crossref_primary_10_1109_TITS_2022_3229477 crossref_primary_10_1109_ACCESS_2024_3410318 crossref_primary_10_3390_info13090408 crossref_primary_10_3390_electronics13010198 crossref_primary_10_1109_TII_2023_3296887 crossref_primary_10_1109_ACCESS_2023_3284316 crossref_primary_10_1038_s41598_023_36606_2 crossref_primary_10_3390_su16052160 crossref_primary_10_1016_j_ins_2024_120560 crossref_primary_10_1109_TCYB_2024_3385910 crossref_primary_10_3390_su14105751 crossref_primary_10_1016_j_ins_2023_119484 crossref_primary_10_1016_j_ins_2022_12_043 crossref_primary_10_1016_j_eswa_2023_121111 crossref_primary_10_1109_TCYB_2023_3266448 crossref_primary_10_3390_app131910803 crossref_primary_10_15622_ia_22_1_1 crossref_primary_10_3390_su14010107 crossref_primary_10_1109_TVT_2024_3444475 crossref_primary_10_1049_itr2_12328 crossref_primary_10_1049_itr2_12208 crossref_primary_10_1109_ACCESS_2025_3544961 crossref_primary_10_1109_TITS_2023_3344590 crossref_primary_10_1109_ACCESS_2023_3275883 crossref_primary_10_1049_itr2_12364 crossref_primary_10_1109_TCYB_2022_3179775 crossref_primary_10_1109_TNNLS_2023_3265358 crossref_primary_10_1109_TCDS_2023_3281878 crossref_primary_10_1002_cjce_24878 crossref_primary_10_1061_JTEPBS_TEENG_8376 crossref_primary_10_1016_j_engappai_2024_108100 crossref_primary_10_3390_computers11030038 crossref_primary_10_1016_j_swevo_2024_101588 crossref_primary_10_1109_JIOT_2024_3401829 crossref_primary_10_1139_cjce_2022_0273 crossref_primary_10_3390_s23052373 crossref_primary_10_1109_TCYB_2021_3107202 crossref_primary_10_1109_OJITS_2021_3126126 crossref_primary_10_1109_MITS_2022_3144797 crossref_primary_10_1109_ACCESS_2022_3214481 crossref_primary_10_1109_TITS_2024_3352446 crossref_primary_10_1007_s10489_023_04652_y crossref_primary_10_1007_s42421_024_00093_2 crossref_primary_10_1080_15472450_2023_2270428 crossref_primary_10_1007_s10489_022_03643_9 crossref_primary_10_1016_j_engappai_2022_105019 crossref_primary_10_1007_s10489_024_05933_w crossref_primary_10_1016_j_epsr_2023_110068 crossref_primary_10_4018_IJACI_323196 crossref_primary_10_1080_21680566_2024_2337216 crossref_primary_10_1007_s10489_022_03840_6 crossref_primary_10_1016_j_engappai_2025_110440 crossref_primary_10_1109_TITS_2022_3173490 crossref_primary_10_1016_j_engappai_2023_106033 crossref_primary_10_1007_s11801_024_3267_2 crossref_primary_10_1109_TCYB_2022_3223918 crossref_primary_10_12677_orf_2024_142143 crossref_primary_10_1007_s10489_021_02256_y crossref_primary_10_1109_TCYB_2024_3356981 crossref_primary_10_1109_TVT_2022_3176620 crossref_primary_10_1016_j_isci_2024_109751 crossref_primary_10_1155_2021_9954267 crossref_primary_10_1109_JAS_2024_124365 crossref_primary_10_3390_app122412783 crossref_primary_10_1109_TCYB_2021_3117705 crossref_primary_10_1109_ACCESS_2024_3395249 crossref_primary_10_1145_3695986 crossref_primary_10_48130_DTS_2023_0012 crossref_primary_10_1109_JIOT_2023_3342480 crossref_primary_10_1109_JIOT_2023_3284510 crossref_primary_10_1038_s41598_023_46074_3 crossref_primary_10_1016_j_eswa_2023_120535 crossref_primary_10_1016_j_geits_2023_100124 crossref_primary_10_1016_j_aej_2022_12_028 crossref_primary_10_1007_s13177_022_00315_3 crossref_primary_10_3390_electronics11030465 crossref_primary_10_1109_LWC_2022_3205503 crossref_primary_10_1007_s13177_024_00426_z crossref_primary_10_1016_j_patcog_2023_109917 crossref_primary_10_1016_j_knosys_2025_113022 |
Cites_doi | 10.1080/15325000490195970 10.1049/iet-its.2009.0070 10.1287/mnsc.1050.0451 10.1145/3308558.3313433 10.1073/pnas.39.10.1953 10.1016/j.trc.2017.09.020 10.1109/TSMCA.2010.2052606 10.1109/TITS.2014.2347300 10.1038/nature14539 10.1162/089976699300016070 10.1145/3219819.3220096 10.1016/j.engappai.2011.04.011 10.1016/S0191-2615(03)00015-8 10.1109/ACC.2016.7525014 10.1109/TCYB.2019.2904742 10.1016/B978-1-55860-307-3.50049-6 10.1049/iet-its.2015.0108 10.1038/nature14236 10.1109/TITS.2019.2901791 10.1145/3068287 10.1109/TITS.2013.2255286 10.1016/j.trc.2013.08.014 10.1109/ITSC.2014.6958095 10.1023/A:1013689704352 10.1109/TITS.2010.2091408 10.1016/j.trpro.2015.09.070 10.1109/ITSC.2011.6083114 10.1016/B978-1-55860-335-6.50027-1 10.1109/TITS.2006.874716 10.1109/AMS.2014.16 10.1162/neco.1994.6.6.1185 10.1007/s10458-008-9046-9 10.1109/CVPR.2018.00493 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
DBID | 97E RIA RIE AAYXX CITATION NPM 7SC 7SP 7TB 8FD F28 FR3 H8D JQ2 L7M L~C L~D 7X8 |
DOI | 10.1109/TCYB.2020.3015811 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitle | CrossRef PubMed Aerospace Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | PubMed MEDLINE - Academic Aerospace Database |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Sciences (General) |
EISSN | 2168-2275 |
EndPage | 187 |
ExternalDocumentID | 32881705 10_1109_TCYB_2020_3015811 9186324 |
Genre | orig-research Journal Article |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61973244; 61573277 funderid: 10.13039/501100001809 |
GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AENEX AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION RIG NPM 7SC 7SP 7TB 8FD F28 FR3 H8D JQ2 L7M L~C L~D 7X8 |
ID | FETCH-LOGICAL-c349t-d752b674154c90b23f0ab1c244dcc7811885e0ce240a550080ffeefa3d42bd5b3 |
IEDL.DBID | RIE |
ISSN | 2168-2267 2168-2275 |
IngestDate | Fri Jul 11 03:51:29 EDT 2025 Mon Jun 30 02:32:31 EDT 2025 Thu Jan 02 22:58:50 EST 2025 Tue Jul 01 00:53:56 EDT 2025 Thu Apr 24 23:10:14 EDT 2025 Wed Aug 27 02:32:32 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c349t-d752b674154c90b23f0ab1c244dcc7811885e0ce240a550080ffeefa3d42bd5b3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0002-2920-0853 0000-0002-3783-1268 0000-0001-5829-4353 |
PMID | 32881705 |
PQID | 2472317112 |
PQPubID | 85422 |
PageCount | 14 |
ParticipantIDs | proquest_miscellaneous_2440470329 crossref_primary_10_1109_TCYB_2020_3015811 crossref_citationtrail_10_1109_TCYB_2020_3015811 pubmed_primary_32881705 proquest_journals_2472317112 ieee_primary_9186324 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-Jan. 2021-1-00 2021-Jan 20210101 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – month: 01 year: 2021 text: 2021-Jan. |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: Piscataway |
PublicationTitle | IEEE transactions on cybernetics |
PublicationTitleAbbrev | TCYB |
PublicationTitleAlternate | IEEE Trans Cybern |
PublicationYear | 2021 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref15 koonce (ref4) 2008 ref11 codeca (ref43) 2018 van hasselt (ref28) 2016 ref19 brockman (ref45) 2016 shamshirband (ref18) 2012; 9 ref46 lillicrap (ref14) 2015 ref48 ref42 ref41 lowe (ref34) 2017 ref44 ref49 vezhnevets (ref47) 2017; 70 ref8 ref7 kuyer (ref22) 2008 ref3 ref6 ref5 stanley (ref26) 1971 ref40 ref35 ref37 casas (ref16) 2017 claus (ref17) 1998 ref36 ref31 ref30 ref33 ref32 hasselt (ref24) 2010 lecun (ref12) 2015; 521 ref2 ref1 ref39 ref38 wiering (ref10) 2004 sutton (ref9) 2018 ref23 ref25 ref20 yang (ref27) 2018 ref21 ref29 |
References_xml | – ident: ref2 doi: 10.1080/15325000490195970 – ident: ref19 doi: 10.1049/iet-its.2009.0070 – ident: ref29 doi: 10.1287/mnsc.1050.0451 – year: 1971 ident: ref26 publication-title: Phase Transitions and Critical Phenomena – ident: ref35 doi: 10.1145/3308558.3313433 – ident: ref30 doi: 10.1073/pnas.39.10.1953 – ident: ref38 doi: 10.1016/j.trc.2017.09.020 – ident: ref7 doi: 10.1109/TSMCA.2010.2052606 – ident: ref49 doi: 10.1109/TITS.2014.2347300 – year: 2018 ident: ref27 publication-title: Mean field multi-agent reinforcement learning – volume: 521 start-page: 436 year: 2015 ident: ref12 article-title: Deep learning publication-title: Nature doi: 10.1038/nature14539 – ident: ref37 doi: 10.1162/089976699300016070 – start-page: 2613 year: 2010 ident: ref24 article-title: Double Q-learning publication-title: Proc Adv Neural Inf Process Syst – ident: ref15 doi: 10.1145/3219819.3220096 – ident: ref6 doi: 10.1016/j.engappai.2011.04.011 – ident: ref5 doi: 10.1016/S0191-2615(03)00015-8 – ident: ref42 doi: 10.1109/ACC.2016.7525014 – ident: ref41 doi: 10.1109/TCYB.2019.2904742 – year: 2018 ident: ref9 publication-title: Reinforcement Learning An Introduction – ident: ref21 doi: 10.1016/B978-1-55860-307-3.50049-6 – year: 2016 ident: ref45 publication-title: OpenAI Gym – start-page: 6379 year: 2017 ident: ref34 article-title: Multi-agent actor-critic for mixed cooperative-competitive environments publication-title: Proc Adv Neural Inf Process Syst – start-page: 222 year: 2016 ident: ref28 article-title: Deep reinforcement learning with double Q-learning publication-title: Proc 13th AAAI Conf Artif Intell – start-page: 43 year: 2018 ident: ref43 article-title: Monaco sumo traffic (most) scenario: A 3D mobility scenario for cooperative its publication-title: Proc SUMO User Conf Simulating Auton Intermodal Transp Syst – ident: ref3 doi: 10.1049/iet-its.2015.0108 – ident: ref13 doi: 10.1038/nature14236 – year: 2017 ident: ref16 publication-title: Deep deterministic policy gradient for urban traffic light control – volume: 70 start-page: 3540 year: 2017 ident: ref47 article-title: Feudal networks for hierarchical reinforcement learning publication-title: Proc 34th Int Conf Mach Learn – year: 2004 ident: ref10 article-title: Intelligent traffic light control – ident: ref23 doi: 10.1109/TITS.2019.2901791 – ident: ref1 doi: 10.1145/3068287 – volume: 9 start-page: 148 year: 2012 ident: ref18 article-title: A distributed approach for coordination between traffic lights based on game theory publication-title: Int Arab J Inf Technol – ident: ref39 doi: 10.1109/TITS.2013.2255286 – ident: ref48 doi: 10.1016/j.trc.2013.08.014 – year: 2008 ident: ref4 article-title: Traffic signal timing manual – start-page: 746 year: 1998 ident: ref17 article-title: The dynamics of reinforcement learning in cooperative multiagent systems publication-title: Proc 15th Proc Nat /10th Conf Artif Intell /Innov Appl Artif Intell – ident: ref32 doi: 10.1109/ITSC.2014.6958095 – start-page: 656 year: 2008 ident: ref22 article-title: Multiagent reinforcement learning for urban traffic control using coordination graphs publication-title: Machine Learning and Knowledge Discovery in Databases – ident: ref25 doi: 10.1023/A:1013689704352 – ident: ref11 doi: 10.1109/TITS.2010.2091408 – ident: ref40 doi: 10.1016/j.trpro.2015.09.070 – ident: ref20 doi: 10.1109/ITSC.2011.6083114 – ident: ref31 doi: 10.1016/B978-1-55860-335-6.50027-1 – year: 2015 ident: ref14 publication-title: Continuous control with deep reinforcement learning – ident: ref8 doi: 10.1109/TITS.2006.874716 – ident: ref44 doi: 10.1109/AMS.2014.16 – ident: ref36 doi: 10.1162/neco.1994.6.6.1185 – ident: ref33 doi: 10.1007/s10458-008-9046-9 – ident: ref46 doi: 10.1109/CVPR.2018.00493 |
SSID | ssj0000816898 |
Score | 2.602133 |
Snippet | Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning... |
SourceID | proquest pubmed crossref ieee |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 174 |
SubjectTerms | Algorithms Convergence Double estimators Games Learning Learning (artificial intelligence) Markov processes mean-field approximation multiagent reinforcement learning (MARL) Multiagent systems Nash equilibrium Simulators Traffic control Traffic flow traffic signal control (TSC) Traffic signals |
Title | Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning |
URI | https://ieeexplore.ieee.org/document/9186324 https://www.ncbi.nlm.nih.gov/pubmed/32881705 https://www.proquest.com/docview/2472317112 https://www.proquest.com/docview/2440470329 |
Volume | 51 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB61PXEB-oKUUrkSh4Lw1rGdjXOkq1YVoj30IZVTFL8qRLVb0V0O_HpmHG8kUIu4WYqdh2ccfzOe-QbgnbNGaC8j92PTcG1Cx804NhzNW-WUsj4mh_7Z-fj0Wn--qW5W4OOQCxNCSMFnYUTNdJbvZ25BrrLDpjTELr4Kq2i49blagz8lFZBIpW8lNjiiijofYpaiObyafD1CY1CijYr7nympQIyShtjpqj92pFRi5Wm0mXadkxdwtnzfPtjk-2gxtyP36y8qx__9oJfwPMNP9qnXl3VYCdMNWM8L_IEdZBbq95tw8YVixPklyjAw3NGIaoJdfrul4ZM-vp2leAPWsfPZz3DHUi5vR6la7CIkQlaXfI8sc7jebsH1yfHV5JTnAgzcKd3Mua8raceEObRrhJUqis6WDhGBd45SVI2pgnABUUGHlg6CzxhDiJ3yWlpfWbUNa9PZNLwG5r1GS7IydfRRRyoiqKKNvo5ROl1KUYBYCqF1mZ2cimTctclKEU1LImxJhG0WYQEfhiH3PTXHvzpv0vQPHfPMF7C7lHSbF-9DK3WNqLdGJFrA_nAZlx2dpXTTMFtQHy00_i1lU8CrXkOGey8Va-fxZ76BZ5ICY5IfZxfW5j8W4S0im7ndSyr9G1et8BY |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1fb9MwED-NIcFegDEGYQOMBBIguXNtp3EeeIDC1LGuD1snjacs_jchphbRFjQ-C1-F78bZcSOBgLdJvEWKnSi-i-9357vfATwxWjFpuae2p0oqlaup6vmSonsrjBDa-hjQPxj1Bsfy3Ul-sgLf21oY51xMPnOdcBnP8u3ULEKobKfsqsAunlIo993FV3TQZi_33qA0n3K--3bcH9DUQ4AaIcs5tUXOdS-YTWlKprnwrNZdg0bNGhOqLJXKHTMODVuNYB3xk_fO-VpYybXNtcDnXoGriDNy3lSHtRGc2LIiNtvleEERxxTp2LTLyp1x__1rdD85esVocfFFa3BNcBX48PJfbGBs6vJ3fBvt3O5N-LFcoSa95WNnMdcd8-038sj_dQlvwY0EsMmr5o9YhxU3uQ3raQubkWeJZ_v5BhwOQxY8PUItdQRtdiDTIEcfzsL0fpPBT2JGBanJaPrFnZNYrVyHYjRy6CLlrInRVZJYas_uwPGlfNsmrE6mE3cPiLUSfeVcFd566UObROG1t4X33MguZxmwpdArk_jXQxuQ8yr6YaysgspUQWWqpDIZvGinfGrIR_41eCOIux2YJJ3B9lKzqrQ9zSouC8T1BWLtDB63t3FjCadF9cRNF2GMZBLtAS8zuNtoZPvspSLf__M7H8H1wfhgWA33RvtbsMZDGlCMWm3D6vzzwj1AHDfXD-PvROD0spXvJwkbTY8 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large-Scale+Traffic+Signal+Control+Using+a+Novel+Multiagent+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+cybernetics&rft.au=Wang%2C+Xiaoqiang&rft.au=Ke%2C+Liangjun&rft.au=Qiao%2C+Zhimin&rft.au=Chai%2C+Xinghua&rft.date=2021-01-01&rft.issn=2168-2275&rft.eissn=2168-2275&rft.volume=51&rft.issue=1&rft.spage=174&rft_id=info:doi/10.1109%2FTCYB.2020.3015811&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2267&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2267&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2267&client=summon |