A review of cooperative multi-agent deep reinforcement learning

Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. Our classification of MARL approaches includes five categories for...

Full description

Saved in:

Bibliographic Details
Published in	Applied intelligence (Dordrecht, Netherlands) Vol. 53; no. 11; pp. 13677 - 13722
Main Authors	Oroojlooy, Afshin, Hajinezhad, Davood
Format	Journal Article
Language	English
Published	New York Springer US 01.06.2023 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Computer Science Deep learning Machine learning Machines Manufacturing Mechanical Engineering Multiagent systems Processes Cooperative learning Reinforcement learning Multi-agent systems
Online Access	Get full text

Cover

Loading…

Abstract	Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. Our classification of MARL approaches includes five categories for modeling and solving cooperative multi-agent reinforcement learning problems: (I) independent learners, (II) fully observable critics, (III) value function factorization, (IV) consensus, and (IV) learn to communicate. We first discuss each of these methods, their potential challenges, and how these challenges were mitigated in the relevant papers. Additionally, we make connections among different papers in each category if applicable. Next, we cover some new emerging research areas in MARL along with the relevant recent papers. In light of MARL’s recent success in real-world applications, we have dedicated a section to reviewing these applications and articles. This survey also provides a list of available environments for MARL research. Finally, the paper is concluded with proposals on possible research directions.
AbstractList	Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. Our classification of MARL approaches includes five categories for modeling and solving cooperative multi-agent reinforcement learning problems: (I) independent learners, (II) fully observable critics, (III) value function factorization, (IV) consensus, and (IV) learn to communicate. We first discuss each of these methods, their potential challenges, and how these challenges were mitigated in the relevant papers. Additionally, we make connections among different papers in each category if applicable. Next, we cover some new emerging research areas in MARL along with the relevant recent papers. In light of MARL’s recent success in real-world applications, we have dedicated a section to reviewing these applications and articles. This survey also provides a list of available environments for MARL research. Finally, the paper is concluded with proposals on possible research directions.
Author	Oroojlooy, Afshin Hajinezhad, Davood
Author_xml	– sequence: 1 givenname: Afshin orcidid: 0000-0001-7829-6145 surname: Oroojlooy fullname: Oroojlooy, Afshin email: oroojlooy@gmail.com organization: SAS Institute Inc – sequence: 2 givenname: Davood surname: Hajinezhad fullname: Hajinezhad, Davood organization: SAS Institute Inc
BookMark	eNp9kE9LAzEQxYNUsK1-AU8LnqPJZtMkJynFf1DwouAtpJvZkrJN1mRb6bc3dQXBQ0_DDO_3ZuZN0MgHDwhdU3JLCRF3iZJKKkzKEpOKEo4PZ2hMuWBYVEqM0JiossKzmfq4QJOUNoQQxggdo_t5EWHv4KsITVGH0EE0vdtDsd21vcNmDb4vLECXZc43IdawPY5aMNE7v75E541pE1z91il6f3x4Wzzj5evTy2K-xDWjqsfUrlaypJwSyxsOtgEm6gqkJIqCVNxaaaUoDQNqZNUwsCB5xWnuFBhr2BTdDL5dDJ87SL3ehF30eaUuszEVfCZZVslBVceQUoRG167P_wTfR-NaTYk-xqWHuHSOS__EpQ8ZLf-hXXRbEw-nITZAKYv9GuLfVSeob_tZgEU
CitedBy_id	crossref_primary_10_3390_en16041608 crossref_primary_10_3390_electronics12132814 crossref_primary_10_32604_cmes_2024_050986 crossref_primary_10_1016_j_eswa_2025_126604 crossref_primary_10_1016_j_engappai_2023_107697 crossref_primary_10_1109_JSAC_2023_3336156 crossref_primary_10_1109_TNSM_2023_3267809 crossref_primary_10_1007_s11432_023_3853_y crossref_primary_10_1088_1361_6560_ac9cb3 crossref_primary_10_3389_frobt_2024_1336612 crossref_primary_10_1016_j_eswa_2025_126565 crossref_primary_10_1177_03611981241297977 crossref_primary_10_1016_j_eswa_2025_127256 crossref_primary_10_3390_s24082461 crossref_primary_10_1080_00207543_2024_2411615 crossref_primary_10_3389_frobt_2023_1089062 crossref_primary_10_1007_s00607_024_01380_0 crossref_primary_10_1016_j_chaos_2023_114032 crossref_primary_10_30657_pea_2025_31_11 crossref_primary_10_1007_s10462_025_11166_1 crossref_primary_10_53941_ijamm_2023_100018 crossref_primary_10_1109_TAI_2024_3415550 crossref_primary_10_1007_s11042_024_19951_w crossref_primary_10_1038_s41467_024_51887_5 crossref_primary_10_1016_j_oceaneng_2024_120243 crossref_primary_10_1016_j_jai_2024_02_003 crossref_primary_10_1016_j_oceaneng_2024_120123 crossref_primary_10_1109_ACCESS_2024_3384923 crossref_primary_10_1016_j_knosys_2024_112665 crossref_primary_10_3390_s23156928 crossref_primary_10_1038_s41598_025_89285_6 crossref_primary_10_3390_app14188383 crossref_primary_10_1016_j_arcontrol_2024_100948 crossref_primary_10_1007_s10489_025_06396_3 crossref_primary_10_3390_mti8040026 crossref_primary_10_1016_j_neucom_2024_127638 crossref_primary_10_3390_app132111905 crossref_primary_10_3390_systems11100525 crossref_primary_10_1007_s11227_024_06634_4 crossref_primary_10_3389_fenrg_2024_1418907 crossref_primary_10_1088_1742_6596_2767_3_032017 crossref_primary_10_1080_17477778_2024_2364715 crossref_primary_10_3390_en17112620 crossref_primary_10_1109_TVT_2024_3388499 crossref_primary_10_1093_scan_nsae014 crossref_primary_10_1002_rnc_7879 crossref_primary_10_1142_S0218126625300016 crossref_primary_10_1007_s40747_024_01415_1 crossref_primary_10_1016_j_cogsys_2024_101306 crossref_primary_10_1016_j_jgsce_2024_205469 crossref_primary_10_1109_TNNLS_2023_3343666 crossref_primary_10_3390_en16155653 crossref_primary_10_1007_s10489_023_05007_3 crossref_primary_10_1016_j_asoc_2025_112939 crossref_primary_10_1098_rsta_2024_0148 crossref_primary_10_3390_app15010116 crossref_primary_10_3390_machines12010008 crossref_primary_10_1016_j_asoc_2023_110758 crossref_primary_10_1016_j_neucom_2024_128068 crossref_primary_10_1109_TITS_2024_3521460 crossref_primary_10_3390_wevj15100453 crossref_primary_10_1007_s10489_024_05293_5 crossref_primary_10_1109_JIOT_2023_3319542 crossref_primary_10_1016_j_neunet_2025_107253 crossref_primary_10_1109_TWC_2024_3464639 crossref_primary_10_1007_s00521_023_08875_5 crossref_primary_10_1109_TCOMM_2023_3282256 crossref_primary_10_1177_03611981231203229 crossref_primary_10_1016_j_jnca_2024_103981 crossref_primary_10_1109_TCDS_2023_3323987 crossref_primary_10_1142_S2737480723500073 crossref_primary_10_1007_s10458_024_09641_0 crossref_primary_10_3390_drones8070320 crossref_primary_10_3390_jsan13010014 crossref_primary_10_1007_s43684_022_00045_z crossref_primary_10_1109_ACCESS_2024_3501775 crossref_primary_10_3390_electronics12010089 crossref_primary_10_1007_s11704_024_3797_6 crossref_primary_10_1007_s11042_023_15361_6 crossref_primary_10_1109_JSEN_2024_3469539 crossref_primary_10_1016_j_phycom_2025_102621 crossref_primary_10_1080_00207543_2025_2479831 crossref_primary_10_1038_s41598_024_54531_w crossref_primary_10_1109_JIOT_2024_3365293 crossref_primary_10_1109_TASE_2024_3398712 crossref_primary_10_1177_09544070241276062 crossref_primary_10_1007_s40747_024_01385_4 crossref_primary_10_1186_s13677_023_00532_5 crossref_primary_10_1109_TITS_2024_3503092 crossref_primary_10_1109_TAC_2024_3375248 crossref_primary_10_1109_JIOT_2024_3353185 crossref_primary_10_3390_aerospace11050372 crossref_primary_10_3390_aerospace10110913 crossref_primary_10_1016_j_ymssp_2024_111473 crossref_primary_10_1007_s10489_023_04601_9 crossref_primary_10_1109_TAC_2024_3387208 crossref_primary_10_1109_TCCN_2024_3384492 crossref_primary_10_3390_math11204392 crossref_primary_10_3390_electronics12122722 crossref_primary_10_1109_JIOT_2024_3497185 crossref_primary_10_1016_j_neucom_2024_128811 crossref_primary_10_1007_s10489_023_04866_0 crossref_primary_10_3390_s25030911 crossref_primary_10_3390_app142110079 crossref_primary_10_1038_s42256_023_00754_x
Cites_doi	10.1162/neco.1997.9.8.1735 10.1613/jair.3912 10.1023/A:1008942012299 10.1109/TAC.2020.2995814 10.1007/s10458-015-9292-6 10.1007/s10845-013-0864-5 10.1007/BF00992696 10.1038/s41586-021-03544-w 10.1007/978-3-030-05816-6_3 10.1609/aaai.v34i05.6216 10.1016/j.piutam.2011.04.021 10.1109/TSP.2013.2241057 10.1007/10992388_4 10.1016/j.ins.2016.05.002 10.24251/HICSS.2018.157 10.1049/iet-its.2009.0070 10.1609/aaai.v32i1.11492 10.1109/CAMSAP.2013.6714066 10.1109/TAC.2009.2037462 10.1145/3308558.3314139 10.1007/978-3-642-19457-3_1 10.1016/j.ejor.2019.10.049 10.1007/978-3-642-00644-9_33 10.1109/IROS.2013.6696903 10.1109/TCYB.2020.2977374 10.1109/SMCIA.2008.5045926 10.1609/aaai.v33i01.33017643 10.1109/TSMCC.2007.913919 10.1609/aaai.v34i05.6205 10.1098/rsta.20140071 10.1137/19M1288012 10.1007/978-3-642-79629-6_18 10.1080/24725854.2020.1851823 10.1080/24725854.2020.1851823 10.1007/978-3-642-14435-6_7 10.1016/j.artint.2021.103500 10.1016/j.eswa.2020.113701 10.1109/ITSC.2011.6083114 10.1145/1553374.1553501 10.1609/aaai.v30i1.10409 10.3390/s150510026 10.1109/ADPRL.2007.368171 10.1145/3412841.3441953 10.1016/j.artint.2014.11.001 10.1109/CDC.2018.8619581 10.1145/3219819.3219993 10.1109/LRA.2019.2903261 10.1109/LRA.2021.3077863 10.1038/nature16961 10.1007/978-3-319-71679-4 10.6028/jres.049.044 10.1177/0278364913495721 10.1007/s10614-020-10038-w 10.1109/COASE.2019.8843338 10.1145/375735.376302 10.1016/j.artint.2019.103216 10.1109/LRA.2020.3026638 10.24963/ijcai.2020/276 10.1609/aaai.v35i13.17353 10.1016/j.ress.2019.04.036 10.1016/j.eswa.2021.116323 10.1109/TSP.2012.2198470 10.1016/j.comnet.2015.12.017 10.1016/S0004-3702(02)00121-2 10.1109/TAC.2012.2209984 10.1109/CVPR.2016.12 10.1371/journal.pone.0172395 10.1109/IROS40897.2019.8968129 10.1109/IROS40897.2019.8968129 10.1038/nature24270 10.1109/IROS.2007.4399095 10.1109/CIG.2016.7860433 10.1007/978-3-030-60990-0_12 10.1109/ACC.2016.7524910 10.1109/TSIPN.2016.2524588 10.1007/s10514-011-9265-9 10.1109/CDC.2018.8619440 10.1049/trit.2018.0001 10.1613/jair.1.11396 10.1007/BF00992699 10.1016/j.eswa.2019.04.056 10.1109/IROS40897.2019.8968129 10.1016/j.ifacol.2020.12.2021 10.1016/j.artint.2014.11.006 10.1017/CBO9780511546877 10.1007/BF00992698 10.1007/s10107-016-1030-6 10.1109/IROS.2012.6386109 10.1007/978-3-642-27645-3_5 10.1609/aaai.v30i1.10295 10.1007/s10458-019-09421-1 10.1016/j.trc.2020.102861 10.1109/TAC.2014.2368731 10.1145/3302509.3311053 10.1002/wics.194 10.1287/msom.2020.0939. 10.1016/j.ress.2017.03.034 10.1038/nature14236 10.1109/TITS.2019.2901791 10.1287/moor.27.4.819.297 10.15607/RSS.2011.VII.035 10.1017/S0269888912000057 10.1109/TSMCA.2007.904825 10.1512/iumj.1957.6.56038 10.1016/j.cirp.2020.04.005 10.1109/ICASSP.2018.8461739 10.1109/CDC40024.2019.9029969 10.1109/TR.2020.3044596 10.1016/j.ifacol.2017.08.1217 10.1609/aaai.v34i04.5744 10.1080/08982112.2020.1766692 10.1145/3357384.3357900 10.1145/3357384.3357902 10.1016/j.renene.2021.09.023 10.1145/3292500.3330949 10.1137/S0363012997331639 10.1109/CVPR.2017.243 10.1109/TIE.2016.2542134 10.1109/ICHI.2017.45 10.1007/s11277-016-3849-9 10.1109/ICCV.2017.321 10.1016/j.future.2010.10.009 10.1016/B978-1-55860-307-3.50049-6 10.1109/ITSC.2014.6958095 10.1109/IROS51168.2021.9635857
ContentType	Journal Article
Copyright	The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Copyright_xml	– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
DBID	AAYXX CITATION 3V. 7SC 7WY 7WZ 7XB 87Z 8AL 8FD 8FE 8FG 8FK 8FL ABJCF ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG F~G GNUQQ HCIFZ JQ2 K60 K6~ K7- L.- L6V L7M L~C L~D M0C M0N M7S P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI PSYQQ PTHSS Q9U
DOI	10.1007/s10489-022-04105-y
DatabaseName	CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Business Premium Collection Technology Collection ProQuest One Community College ProQuest Central Korea Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student ProQuest SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database ABI/INFORM Professional Advanced ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ABI/INFORM Global Computing Database Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest One Psychology Engineering Collection ProQuest Central Basic
DatabaseTitle	CrossRef ABI/INFORM Global (Corporate) ProQuest Business Collection (Alumni Edition) ProQuest One Business ProQuest One Psychology Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ABI/INFORM Complete ProQuest Central ABI/INFORM Professional Advanced ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace ABI/INFORM Complete (Alumni Edition) Engineering Collection Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global ProQuest Computing Engineering Database ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection ProQuest Business Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection ProQuest One Business (Alumni) ProQuest One Academic ProQuest Central (Alumni) ProQuest One Academic (New) Business Premium Collection (Alumni)
DatabaseTitleList	ABI/INFORM Global (Corporate)
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-7497
EndPage	13722
ExternalDocumentID	10_1007_s10489_022_04105_y
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C -~X .86 .DC .VR 06D 0R~ 0VY 1N0 1SB 2.D 203 23M 28- 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 77K 7WY 8FE 8FG 8FL 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABIVO ABJCF ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTAH ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BEZIV BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DWQXO EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ K60 K6V K6~ K7- KDC KOV KOW L6V LAK LLZTM M0C M0N M4Y M7S MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P62 P9O PF0 PQBIZ PQBZA PQQKQ PROAC PSYQQ PT4 PT5 PTHSS Q2X QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7X Z7Z Z81 Z83 Z88 Z8M Z8N Z8R Z8T Z8U Z8W Z92 ZMTXR ZY4 ~A9 ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT 7SC 7XB 8AL 8FD 8FK ABRTQ JQ2 L.- L7M L~C L~D PKEHL PQEST PQGLB PQUKI Q9U
ID	FETCH-LOGICAL-c319t-1dbb821510d5f5edfe37c4e88091e895dd8d872a3e1a84f3ede854511a89eada3
IEDL.DBID	U2A
ISSN	0924-669X
IngestDate	Fri Jul 25 12:27:19 EDT 2025 Tue Jul 01 03:31:54 EDT 2025 Thu Apr 24 22:59:04 EDT 2025 Fri Feb 21 02:43:12 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	11
Keywords	Cooperative learning Reinforcement learning Multi-agent systems
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c319t-1dbb821510d5f5edfe37c4e88091e895dd8d872a3e1a84f3ede854511a89eada3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-7829-6145
PQID	2821175683
PQPubID	326365
PageCount	46
ParticipantIDs	proquest_journals_2821175683 crossref_citationtrail_10_1007_s10489_022_04105_y crossref_primary_10_1007_s10489_022_04105_y springer_journals_10_1007_s10489_022_04105_y
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20230600 2023-06-00 20230601
PublicationDateYYYYMMDD	2023-06-01
PublicationDate_xml	– month: 6 year: 2023 text: 20230600
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Boston
PublicationSubtitle	The International Journal of Research on Intelligent Systems for Real Life Complex Problems
PublicationTitle	Applied intelligence (Dordrecht, Netherlands)
PublicationTitleAbbrev	Appl Intell
PublicationYear	2023
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	WuJXuXZhangPLiuCA novel multi-agent reinforcement learning approach for job scheduling in grid computingFutur Gener Comput Syst201127543043910.1016/j.future.2010.10.009 Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. Adv Neural Inf Process Syst. Deep learning workshop kthankar G, Rodriguez-Aguilar JA (2017) Autonomous agents and multiagent systems. In: AAMAS 2017 workshops, best papers, São Paulo, Brazil, 8-12 May 2017. Revised selected papers, vol 10642. Springer Nazari M, Oroojlooy A, Snyder L, Takác M. (2018) Reinforcement learning for solving the vehicle routing problem Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games Usunier N, Synnaeve G, Lin Z, Chintala S (2017) Episodic exploration for deep deterministic policies for starcraft micromanagement. In: International conference on learning representations. https://openreview.net/forum?id=r1LXit5ee. Accessed 28 July 2019 Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence Ryu H, Shin H, Park J (2018) Multi-agent actor-critic with generative cooperative policy network. arXiv:1810.09206 Huang J, Chang Q, Chakraborty N (2019) Machine preventive replacement policy for serial production lines based on reinforcement learning. In: 2019 IEEE 15th international conference on automation science and engineering (CASE). IEEE, pages 523–528 Sam D, Daniel K (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th international conference on autonomous agents and multiagent systems-volume 1. International foundation for autonomous agents and multiagent systems, pp 225–232 Greg B, Vicki C, Ludwig P, Jonas S, John S, Jie T, Wojciech Zaremba (2016) Openai gym StonePVelosoMMultiagent systems: a survey from a machine learning perspectiveAuton Robot20008334538310.1023/A:1008942012299 Mao H, Zhang Z, Xiao Z, Gong Z (2019) Modelling the dynamic joint policy of teammates with attention multi-agent ddpg. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, pp 1108–1116 Chen HWC, Nan X u, Zheng G, Yang M, Xiong Y, Kai X, Li Z (2020) Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. In: Proceedings of the thirty-fourth AAAI conference on artificial intelligence KarSMouraJoséMFH VincentP${\mathcal {QD}}$QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovationsIEEE Trans Signal Process20136171848186230383951393.9429310.1109/TSP.2013.2241057 Su J, Adams S, Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11352–11360 Singh A, Jain T, Sukhbaatar S (2018) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: ICLR Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) MINOS: multimodal indoor simulator for navigation in complex environments. arXiv:1712.03931 Wang RE, Everett M, How JP (2019b) R-maddpg for partially observable environments and limited communication. ICML 2019 Workshop RL4reallife Hernandez-LealPKartalBTaylorMEA survey and critique of multiagent deep reinforcement learningAuton Agent Multi-Agent Syst201933675079710.1007/s10458-019-09421-1 Jorge E, Kågebäck M, Johansson FD, Gustavsson E (2016) Learning to play guess who? and inventing a grounded language as a consequence. arXiv:1611.03218 Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in neural information processing systems, pp 4190– 4203 Gabel T, Riedmiller M (2007) On a successful application of multi-agent reinforcement learning to operations research benchmarks. In: 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning. IEEE, pp 68–75 Varshavskaya P, Kaelbling LP, Rus D (2019) Efficient distributed reinforcement learning through agreement. In: Distributed autonomous robotic systems 8. Springer, pp 367–378 Mousavi HK, Liu G, Yuan W, Takác M, Munoz-Avila H, Motee N (2019) A layered architecture for active perception: Image classification using deep reinforcement learning. CoRR, arXiv:1909.09705 KoberJAndrew BagnellJJan PReinforcement learning in robotics a surveyInt J Robot Res201332111238127410.1177/0278364913495721 Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pages 993–1000, New York. ACM. ISBN 978-1-60558-516-1. https://doi.org/10.1145/1553374.1553501 Shuo J (2019) Multi-Agent Reinforcement Learning Environment. https://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environmenthttps://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environment Accessed 2019-07-28 WuJXuXDecentralised grid scheduling approach based on multi-agent reinforcement learning and gossip mechanismCAAI Trans Intell Technol20183181710.1049/trit.2018.0001 Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937 Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning-vol 70. JMLR. org, pp 3540–3549 DandanovNAl-ShatriHKleinAPoulkovVDynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networksWirel Pers Commun201792125127810.1007/s11277-016-3849-9 Tasfi N (2016) Pygame learning environment. https://github.com/ntasfi/PyGame-Learning-Environment. Accessed 28 July 2019 Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning. Citeseer Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Gonzalez J, Goldberg K, Stoica I (2017) Ray RLlib: a composable and scalable reinforcement learning library. In: Deep reinforcement learning symposium (DeepRL @ NeurIPS) Zhang K, Yang Z, Liu H, Zhang T, Basar T (2018c) Fully decentralized multi-agent reinforcement learning with networked agents. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. of proceedings of machine learning research. PMLR, 10–15 Jul, Stockholmsmassan, Stockholm Sweden pp 5872–5881 HuangJChangQArinezJDeep reinforcement learning based preventive maintenance policy for serial production linesExpert Syst Appl202016011370110.1016/j.eswa.2020.113701 HolmesParker C, Taylor ME, Zhan Y, Tumer K (2014) Exploiting structure and agent-centric rewards to promote coordination in large multiagent systems. In: Adaptive and learning agents workshop Prabuchandran KJ, Hemanth Kumar AN, Bhatnagar S (2014) Multi-agent reinforcement learning for traffic signal control. In: 17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, pp 2529–2534 Marc B, Peng W (2019) Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. In: Reinforcement learning for real life workshop in the 36th international conference on machine learning, long beach HochreiterSSchmidhuberJLong short-term memoryNeural comput1997981735178010.1162/neco.1997.9.8.1735 Son K, Kim D, Kang WJ, Hostallero ED, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 31st international conference on machine learning, proceedings of machine learning research. PMLR SuttleWYangZZhangKWangZBaşarTLiuJA multi-agent off-policy actor-critic algorithm for distributed reinforcement learningIFAC-PapersOnLine. ISSN 2405-8963. 21th IFAC World Congress202053215491554https://doi.org/10.1016/j.ifacol.2020.12.2021. https://www.sciencedirect.com/science/article/pii/S2405896320326562 Lucian BRobert BBart DeSA comprehensive survey of multiagent reinforcement learningIEEE Trans Syst, Man, Cybern, Part C (Appl Rev)200838215617210.1109/TSMCC.2007.913919 Cáp M, Novák P, Seleckỳ M, Faigl J, Jiff V. (2013) Asynchronous decentralized prioritized planning for coordination in multi-robot system. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3822–3829 CassanoLYuanKSayedAHMultiagent fully decentralized value function learning with linear convergence ratesIEEE Trans Auto Cont20216641497151242401830735220510.1109/TAC.2020.2995814https://doi.org/10.1109/TAC.2020.2995814 Weiß G (1995) Distributed reinforcement learning. In: Luc Steels (ed) The Biology and technology of intelligent autonomous agents, pp 415–428. Berlin, Heidelberg. Springer Berlin Heidelberg Hado VH (2010) Double q-learning. In: Advances in neural information processing systems, pp 2613–2621 Lanctot M, Lockhart E, Lespiau J-B, Zambaldi V, Upadhyay S, Pérolat J, Srinivasan S, Timbers F, Tuyls K, Omidshafiei S et al (2019) Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 Guillaume S, Yue W, William P, TK SK, Sven K, Howie C (2019b) Distributed reinforcement learning for multi-robot decentralized collective construction. In: Distributed Autonomous Robotic Systems. Springer, pp 35–49 AndriotisCPPapakonstantinouKGManaging engineering systems with large state and action spaces through deep reinforc Y Gong (4105_CR66) 2019; 1 K Zhang (4105_CR257) 2020; 58 4105_CR214 4105_CR213 4105_CR216 4105_CR215 4105_CR218 4105_CR217 4105_CR210 4105_CR212 4105_CR211 4105_CR104 4105_CR225 4105_CR103 4105_CR224 P Hernandez-Leal (4105_CR70) 2019; 33 4105_CR105 4105_CR226 4105_CR108 4105_CR229 4105_CR107 4105_CR100 4105_CR220 4105_CR102 4105_CR223 4105_CR101 S Bart De (4105_CR29) 2008; 38 H Edward (4105_CR13) 2020; 280 J Chen (4105_CR34) 2012; 60 Z Ke (4105_CR259) 2020; 121 4105_CR93 Q Guannan (4105_CR161) 2017; 5 4105_CR94 4105_CR91 H Zhang (4105_CR253) 2017; 64 4105_CR92 4105_CR99 4105_CR97 4105_CR98 J Wu (4105_CR238) 2018; 3 J Su (4105_CR194) 2022; 192 M Glavic (4105_CR65) 2017; 50 4105_CR82 4105_CR203 4105_CR83 4105_CR202 T Chu (4105_CR39) 2019; 21 4105_CR80 4105_CR81 4105_CR204 4105_CR207 P Stone (4105_CR192) 2000; 8 4105_CR209 4105_CR208 PD Lorenzo (4105_CR49) 2016; 2 4105_CR88 4105_CR89 4105_CR86 J Lussange (4105_CR122) 2021; 57 4105_CR87 4105_CR84 4105_CR85 4105_CR200 W Suttle (4105_CR201) 2020; 53 4105_CR71 A Eck (4105_CR53) 2016; 30 FLD Silva (4105_CR43) 2019; 64 A Oroojlooyjadid (4105_CR150) 2017; 0 D Silver (4105_CR184) 2017; 550 4105_CR79 4105_CR77 4105_CR75 4105_CR76 4105_CR73 A Huang (4105_CR183) 2016; 529 4105_CR74 K Zhang (4105_CR258) 2021 S Wang (4105_CR227) 2016; 101 Kyuree A (4105_CR4) 2021; 0 DS Bernstein (4105_CR18) 2002; 27 4105_CR60 4105_CR61 VS Borkar (4105_CR24) 2000; 38 M-A Dittrich (4105_CR52) 2020; 69 Y Liu (4105_CR120) 2020; 283 4105_CR68 4105_CR69 4105_CR67 4105_CR62 4105_CR63 VR Padullaparthi (4105_CR151) 2022; 181 4105_CR59 4105_CR50 J Garcıa (4105_CR64) 2015; 16 M Zhang (4105_CR246) 2015; 15 4105_CR57 4105_CR195 4105_CR58 4105_CR197 4105_CR56 M Schmidt (4105_CR173) 2017; 162 4105_CR196 4105_CR199 4105_CR54 4105_CR198 P Bianchi (4105_CR22) 2012; 58 4105_CR48 A Seth (4105_CR177) 2011; 2 Y Song (4105_CR189) 2020; 34 4105_CR46 4105_CR47 N Yousefi (4105_CR248) 2020; 32 4105_CR45 4105_CR40 4105_CR41 R Cui (4105_CR42) 2012; 32 4105_CR180 4105_CR37 4105_CR182 4105_CR38 M Sergio Valcarcel (4105_CR126) 2015; 60 CP Andriotis (4105_CR7) 2019; 191 J Huang (4105_CR78) 2020; 160 X Wang (4105_CR228) 2016; 27 4105_CR35 4105_CR36 4105_CR172 4105_CR33 4105_CR175 4105_CR31 4105_CR176 4105_CR30 4105_CR178 4105_CR28 4105_CR191 A Tampuu (4105_CR206) 2017; 12 4105_CR190 M Bowling (4105_CR26) 2002; 136 4105_CR193 4105_CR27 N Dandanov (4105_CR44) 2017; 92 G Sartoretti (4105_CR169) 2019; 4 Z Ding (4105_CR51) 2020; 33 A Mirhoseini (4105_CR133) 2021; 594 4105_CR25 4105_CR186 4105_CR23 4105_CR185 4105_CR20 4105_CR188 4105_CR21 4105_CR187 P Pennesi (4105_CR157) 2010; 55 4105_CR17 4105_CR15 4105_CR159 4105_CR158 CJ Watkins (4105_CR231) 1992; 8 MG Bellemare (4105_CR16) 2013; 47 4105_CR14 4105_CR11 4105_CR153 4105_CR12 4105_CR152 Dimitri P B (4105_CR19) 1996 4105_CR155 4105_CR10 4105_CR154 4105_CR156 G Wagner (4105_CR219) 2015; 219 4105_CR171 4105_CR170 S Hochreiter (4105_CR72) 1997; 9 RS Sutton (4105_CR205) 2016; 17 Q Li (4105_CR109) 2021; 6 L Cassano (4105_CR32) 2021; 66 4105_CR162 4105_CR164 4105_CR166 4105_CR165 4105_CR168 M-C Fitouhi (4105_CR55) 2017; 166 4105_CR167 4105_CR137 4105_CR136 4105_CR139 MAL Silva (4105_CR181) 2019; 131 4105_CR138 R Williams (4105_CR236) 1992; 8 4105_CR250 4105_CR252 4105_CR130 4105_CR251 4105_CR254 4105_CR132 4105_CR135 4105_CR256 4105_CR134 TT Nguyen (4105_CR147) 2020; 50 4105_CR255 S Kar (4105_CR90) 2013; 61 H Wang (4105_CR222) 2016; 363 4105_CR148 L-J Lin (4105_CR115) 1992; 8 4105_CR149 4105_CR140 4105_CR261 4105_CR260 4105_CR142 4105_CR263 4105_CR141 4105_CR262 4105_CR144 4105_CR143 4105_CR146 4105_CR145 4105_CR114 4105_CR235 J Kober (4105_CR95) 2013; 32 4105_CR117 4105_CR116 4105_CR237 4105_CR119 4105_CR118 G Sharon (4105_CR179) 2015; 219 4105_CR230 I Arel (4105_CR8) 2010; 4 4105_CR111 4105_CR232 4105_CR110 4105_CR113 4105_CR234 4105_CR112 M Rangwala (4105_CR163) 2020; 33 4105_CR233 DP Kroese (4105_CR96) 2012; 4 LA Prashanth (4105_CR160) 2010; 12 4105_CR9 4105_CR6 4105_CR5 L Matignon (4105_CR131) 2012; 2 4105_CR2 J Wu (4105_CR239) 2011; 27 4105_CR3 4105_CR1 4105_CR247 4105_CR125 4105_CR128 4105_CR249 4105_CR127 B Wang (4105_CR221) 2020; 5 4105_CR129 J Lee (4105_CR106) 2007; 37 CS de Witt (4105_CR174) 2019; 32 4105_CR241 4105_CR240 4105_CR243 4105_CR121 4105_CR242 4105_CR124 4105_CR245 4105_CR123 4105_CR244
References_xml	– reference: Kamil C, Shimon W (2020) Expected policy gradients for reinforcement learning. http://jmlr.org/papers/v21/18-012.html. Accessed 28 Feb 2021, vol 21, pp 1–51 – reference: Shu T, Tian Y (2019) M3RL: mind-aware multi-agent management reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=BkzeUiRcY7. Accessed 18 Jan 2020 – reference: Jakob F, Nantas N, Gregory F, Triantafyllos A, Philip HS T, Pushmeet K, Shimon W (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR. org, pp 1146–1155 – reference: WuJXuXZhangPLiuCA novel multi-agent reinforcement learning approach for job scheduling in grid computingFutur Gener Comput Syst201127543043910.1016/j.future.2010.10.009 – reference: Mousavi HK, Liu G, Yuan W, Takác M, Munoz-Avila H, Motee N (2019) A layered architecture for active perception: Image classification using deep reinforcement learning. CoRR, arXiv:1909.09705 – reference: Jiang J, Chen D, Tiejun H, Zongqing L (2020) Graph convolutional reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=HkxdQkSYDB. Accessed 15 May 2020 – reference: de WittCSFoersterJFarquharGTorrPBoehmerWWhitesonSMulti-agent common knowledge reinforcement learningAdv Neural Inf Process Syst20193299279939 – reference: Hado VH (2010) Double q-learning. In: Advances in neural information processing systems, pp 2613–2621 – reference: Xueguang L, Yuchen X, Brett D, Chris A (2021) Contrasting centralized and decentralized critics in multi-agent reinforcement learning. In: AAMAS – reference: Natasha J, Angeliki L, Edward H, Caglar G, Pedro O, Dj S, Joel ZL, Nando DF (2019) Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, long beach, California, USA, 09–15 Jun pp 3040–3049. http://proceedings.mlr.press/v97/jaques19a.html. Accessed 28 Oct 2019 – reference: SilvaMALde SouzaSRSouzaMJFBazzanALCA reinforcement learning-based multi-agent framework applied for solving routing and scheduling problemsExpert Syst Appl201913114817110.1016/j.eswa.2019.04.056 – reference: Arrow JA, Hurwicz L, Uzawa H (1958) Studies in linear and non-linear programming. Stanford University Press – reference: Lillicrap T, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR (Poster) – reference: Dimitri P BJohn N TNeuro-Dynamic Programming1996Belmont, MAAthena Scientific0924.68163 – reference: ChenJSayedAHDiffusion adaptation strategies for distributed optimization and learning over networksIEEE Trans Signal Process20126084289430529604961391.9060110.1109/TSP.2012.2198470 – reference: HolmesParker C, Taylor ME, Zhan Y, Tumer K (2014) Exploiting structure and agent-centric rewards to promote coordination in large multiagent systems. In: Adaptive and learning agents workshop – reference: Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pages 993–1000, New York. ACM. ISBN 978-1-60558-516-1. https://doi.org/10.1145/1553374.1553501 – reference: Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016d) Dueling network architectures for deep reinforcement learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of The 33rd international conference on machine learning, vol 48 of Proceedings of machine learning research, pp 1995–2003, New York, New York, USA, 20–22 Jun. PMLR. http://proceedings.mlr.press/v48/wangf16.html. Accessed 28 July 2019 – reference: Kyunghyun C, Bart van M, Gülçehre Ç, Dzmitry B, Fethi B, Holger S, Yoshua B (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: EMNLP, pp 1724–1734. http://aclweb.org/anthology/D/D14/D14-1179.pdf. Accessed 28 July 2019 – reference: Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv:1610.03295 – reference: Yang Y, Wang J (2020) An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv:2011.00583 – reference: Arora S, Prashant D (2021) A survey of inverse reinforcement learning Challenges, methods and progress. Artif Intell, pp 103500 – reference: Wu C, Kreidieh A, Parvate K, Vinitsky E, Bayen AM (2017) Flow: Architecture and benchmarking for reinforcement learning in traffic control. arXiv:1710.05465 – reference: Hernandez-LealPKartalBTaylorMEA survey and critique of multiagent deep reinforcement learningAuton Agent Multi-Agent Syst201933675079710.1007/s10458-019-09421-1 – reference: Wu Y, Wu Y, Gkioxari G, Tian Y (2018) Building generalizable agents with a realistic and rich 3d environment. https://openreview.net/forum?id=rkaT3zWCZ. Accessed 28 July 2019 – reference: Seijen HV, Fatemi M, Romoff J, Laroche R, Barnes T, Tsang J (2017) Hybrid reward architecture for reinforcement learning. In: Advances in Neural Information Processing Systems, pp 5392–5402 – reference: Papoudakis G, Christianos F, Schäfer L, Albrecht SV (2021) Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track (Round 1). https://openreview.net/forum?id=cIrPX-Sn5n. Accessed 21 Nov 2021 – reference: Wang RE, Everett M, How JP (2019a) R-maddpg for partially observable environments and limited communication. In: Reinforcement learning for real life workshop in the 36th international conference on machine learning, Long Beach – reference: Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937 – reference: Bhatnagar S, Precup D, Silver D, Sutton RS, Maei HR, Szepesvári C. (2009) Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in neural information processing systems, pp 1204–1212 – reference: Su J, Adams S, Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11352–11360 – reference: ZhangKYangZBasarTMulti-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp 321–3842021ChamSpringer – reference: Nolan BJakob N FSarath CNeil BMarc LH F SEmilio PVincent DSubhodeep MEdwardHThe hanabi challenge: a new frontier for ai researchArtif Intell202028010321640425671476.6822310.1016/j.artint.2019.103216 – reference: Devlin S, Yliniemi L, Kudenko D, Kagan T (2014) Potential-based difference rewards for multiagent reinforcement learning – reference: Rabinowitz N, Perbet F, Song F, Zhang C, Eslami SMA, Botvinick M (2018) Machine theory of mind. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research. PMLR, Stockholmsmassan, Stockholm Sweden, 10–15 Jul, pp 4218–4227. http://proceedings.mlr.press/v80/rabinowitz18a.html – reference: Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: ICLR (Poster) – reference: Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J, Jordan M, Stoica I (2018) RLlib: abstractions for distributed reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research. PMLR, 10–15 Jul, pp 3053–3062. http://proceedings.mlr.press/v80/liang18b.html. Accessed 23 Nov 2019 – reference: Chen HWC, Nan X u, Zheng G, Yang M, Xiong Y, Kai X, Li Z (2020) Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. In: Proceedings of the thirty-fourth AAAI conference on artificial intelligence – reference: SharonGSternRFelnerASturtevantNRConflict-based search for optimal multi-agent pathfindingArtif Intell2015219406632937521328.6823510.1016/j.artint.2014.11.006 – reference: Wenhang B, Xiao-yang L (2019) Multi-agent deep reinforcement learning for liquidation strategy analysis. In: Workshops at the Thirty-Sixth ICML Conference on AI in Finance – reference: Dayong YZhangMYangYA multi-agent framework for packet routing in wireless sensor networksSensors2015155100261004710.3390/s150510026 – reference: BianchiPJakubowiczJConvergence of a multi-agent projected stochastic gradient algorithm for non-convex optimizationIEEE Trans Autom Control201258239140530239311369.9013110.1109/TAC.2012.2209984 – reference: SuJHuangJAdamsSChangQBelingPADeep multi-agent reinforcement learning for multi-level preventive maintenance in manufacturing systemsExpert Syst Appl202219211632310.1016/j.eswa.2021.116323 – reference: Steffen B (2015) Tecnomatix plant simulation: Modeling and programming by means of examples. In: Springer – reference: Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games – reference: Johnson M, Hofmann K, Hutton T, Bignell D (2016) The malmo platform for artificial intelligence experimentation. In: IJCAI, pp 4246–4247 – reference: Mordatch I, Abbeel P (2018b) Emergence of grounded compositional language in multi-agent populations. In: Thirty-second AAAI conference on artificial intelligence – reference: ArelILiuCUrbanikTKohlsAGReinforcement learning-based multi-agent system for network traffic signal controlIET Intell Transp Syst20104212813510.1049/iet-its.2009.0070 – reference: Samvelyan M, Rashid T, De Witt CS, Farquhar G, Nardelli N, Rudner TGJ, Hung C-M, Torr PHS, Foerster J, Whiteson S (2019b) The starcraft multi-agent challenge. arXiv:1902.04043 – reference: Huang J, Chang Q, Chakraborty N (2019) Machine preventive replacement policy for serial production lines based on reinforcement learning. In: 2019 IEEE 15th international conference on automation science and engineering (CASE). IEEE, pages 523–528 – reference: Lazaridou A, Peysakhovich A, Baroni M (2017) Multi-agent cooperation and the emergence of (natural) language. In: ICLR – reference: Matignon L, Laurent G, Fort-Piat NL (2007) Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ international conference on intelligent robots and systems. IROS’07, pp 64– 69 – reference: Yu Fan C, Miao L, Michael E, How JP (2017) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 285–292 – reference: BernsteinDSGivanRImmermanNZilbersteinSThe complexity of decentralized control of markov decision processesMath Oper Res200227481984019391791082.9059310.1287/moor.27.4.819.297 – reference: Chu X, Ye H (2017) Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. arXiv:1710.00336 – reference: Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press – reference: Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 AAAI fall symposium series – reference: Lucian BRobert BBart DeSA comprehensive survey of multiagent reinforcement learningIEEE Trans Syst, Man, Cybern, Part C (Appl Rev)200838215617210.1109/TSMCC.2007.913919 – reference: Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR, pp 1587–1596 – reference: Zhang Y, Zavlanos MM (2019) Distributed off-policy actor-critic reinforcement learning with policy consensus. In: 2019 IEEE 58th Conference on decision and control (CDC), pp 4674–467. https://doi.org/10.1109/CDC40024.2019.9029969 – reference: DittrichM-AFohlmeisterSCooperative multi-agent system for production control using reinforcement learningCIRP Ann202069138939210.1016/j.cirp.2020.04.005 – reference: Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning-vol 70. JMLR. org, pp 3540–3549 – reference: Buşoniu L, Babuška R, Bart DS (2010) Multi-agent reinforcement learning: An overview. In: Innovations in multi-agent systems and applications-1. Springer, pp 183–221 – reference: Jorge E, Kågebäck M, Johansson FD, Gustavsson E (2016) Learning to play guess who? and inventing a grounded language as a consequence. arXiv:1611.03218 – reference: Wai HT, Yang Z, Wang PZ, Hong M (2018) Multi-agent reinforcement learning via double averaging primal-dual optimization. In: Advances in neural information processing systems, pp 9649–9660 – reference: Mousavi HK, Nazari M, Takáč M, Motee N (2019) Multi-agent image classification via reinforcement learning. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5020–5027. https://doi.org/10.1109/IROS40897.2019.8968129https://doi.org/10.1109/IROS40897.2019.8968129 – reference: Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) MINOS: multimodal indoor simulator for navigation in complex environments. arXiv:1712.03931 – reference: RangwalaMWilliamsRLearning multi-agent communication through structured attentive reasoningAdv Neural Inf Process Syst2020331008810098 – reference: Chen Y, Liu Y u, Xiahou T (2021) A deep reinforcement learning approach to dynamic loading strategy of repairable multistate systems. IEEE Trans Reliability – reference: WuJXuXDecentralised grid scheduling approach based on multi-agent reinforcement learning and gossip mechanismCAAI Trans Intell Technol20183181710.1049/trit.2018.0001 – reference: LiQLinWLiuZProrokAMessage-aware graph attention networks for large-scale multi-robot path planningIEEE Robot Autom Lett2021635533554010.1109/LRA.2021.3077863 – reference: Richard B (1957) A markovian decision process. J Math Mech, pp 679–684 – reference: Gao Q, Hajinezhad D, Zhang Y, Kantaros Y, Zavlanos MM (2019) Reduced variance deep reinforcement learning with temporal logic specifications. In: Proceedings of the 10th ACM/IEEE international conference on cyber-physical systems. ACM, pp 237–248 – reference: Singh A, Jain T, Sukhbaatar S (2018) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: ICLR – reference: David B, Xiangyu Z, Dylan W, Deepthi V, Rohit C, Jennifer K, Ahmed S Z (2021) Powergridworld: a framework for multi-agent reinforcement learning in power systems arXiv:2111.05969 – reference: HuangJChangQArinezJDeep reinforcement learning based preventive maintenance policy for serial production linesExpert Syst Appl202016011370110.1016/j.eswa.2020.113701 – reference: Hong M, Hajinezhad D, Zhao M-M (2017) Prox-pda: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR. org, pp 1529–1538 – reference: Kasai T, Tenmoto H, Kamiya A (2008) Learning of communication codes in multi-agent reinforcement learning problem. In: IEEE conference on soft computing in industrial applications. IEEE, pp 1–6 – reference: GlavicMFonteneauRErnstDReinforcement learning for electric power system decision and control: past considerations and perspectivesIFAC-PapersOnLine20175016918692710.1016/j.ifacol.2017.08.1217 – reference: Abhishek D, Théophile G, Joshua R, Dhruv B, Devi P, Mike R, Joelle P (2019) TarMAC: Targeted multi-agent communication. In: Kamalika Chaudhuri, Ruslan Salakhutdinov (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, long beach, California, pp 1538–1546, 09–15 Jun, http://proceedings.mlr.press/v97/das19a.html. Accessed 28 Oct 2019 – reference: Yedid H (2017) Vain: attentional multi-agent predictive modeling. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc., pp 2701–2711. http://papers.nips.cc/paper/6863-vain-attentional-multi-agent-predictive-modeling.pdfhttp://papers.nips.cc/paper/6863-vain-attentional-multi-agent-predictive-modeling.pdf. Accessed 28 Oct 2019 – reference: Ma H, Tovey C, Sharon G, Kumar TK, Koenig S (2016) Multi-agent path finding with payload transfers and the package-exchange robot-routing problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 30 – reference: Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence – reference: Aimsun (2019) Aimsun next 8.4 user’s manual. In: Aimsun SL – reference: Tasfi N (2016) Pygame learning environment. https://github.com/ntasfi/PyGame-Learning-Environment. Accessed 28 July 2019 – reference: DandanovNAl-ShatriHKleinAPoulkovVDynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networksWirel Pers Commun201792125127810.1007/s11277-016-3849-9 – reference: Zheng G, Xiong Y, Zang X, Feng J, Wei H, Zhang H, Li Y, Kai XU, Li Z (2019) Learning phase competition for traffic signal control. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1963–1972 – reference: Varshavskaya P, Kaelbling LP, Rus D (2019) Efficient distributed reinforcement learning through agreement. In: Distributed autonomous robotic systems 8. Springer, pp 367–378 – reference: Jiang J, Zongqing L u (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264 – reference: Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz K, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 – reference: KarSMouraJoséMFH VincentP${\mathcal {QD}}$QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovationsIEEE Trans Signal Process20136171848186230383951393.9429310.1109/TSP.2013.2241057 – reference: Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 5026–5033 – reference: Lin K, Zhao R, Zhe X, Zhou J (2018) Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 1774–1783 – reference: Ruishan L, James Z (2018) The effects of memory replay in reinforcement learning. In: 2018 56th annual allerton conference on communication, control, and computing (Allerton). IEEE, pp 478–485 – reference: Alekh A, Sham M K, Jason D L, Gaurav M (2020) Optimality and approximation with policy gradient methods in markov decision processes. In: Conference on learning theory. PMLR, pp 64–66 – reference: Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research. PMLR, Stockholmsmassan, Stockholm Sweden, 10–15 Jul, pp 4295–4304. http://proceedings.mlr.press/v80/rashid18a.html. Accessed 27 Feb 2019 – reference: Liu Y, Logan B, Liu N, Zhiyuan X, Tang J, Wang Y (2017) Deep reinforcement learning for dynamic treatment regimes on medical registry data. In: 2017 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 380–385 – reference: Mguni D, Jennings J, Macua SV, Ceppi S, de Cote EM (2018) Controlling the crowd: inducing efficient equilibria in multi-agent systems. In: Advances in neural information processing systems 2018 MLITS workshop – reference: Yang J, Nakhaei A, Isele D, Fujimura K, Zha H (2020) Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=S1lEX04tPr. Accessed 08 Nov 2020 – reference: Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897 – reference: Shariq I, Fei S (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, long beach, California, USA, 09–15 Jun, pp 2961–2970. http://proceedings.mlr.press/v97/iqbal19a.html – reference: Jing G, Bai H, George J, chakrabortty A, Piyush K S (2022) A scalable graph-theoretic distributed framework for cooperative multi-agent reinforcement learning. arXiv:2202.13046 – reference: kthankar G, Rodriguez-Aguilar JA (2017) Autonomous agents and multiagent systems. In: AAMAS 2017 workshops, best papers, São Paulo, Brazil, 8-12 May 2017. Revised selected papers, vol 10642. Springer – reference: Das A, Kottur S, Moura José MF, Lee S, Batra D (2017) Learning cooperative visual dialog agents with deep reinforcement learning. In: Inproceedings of the IEEE international conference on computer vision, pp 2951–2960 – reference: GongYAbdel-AtyMCaiQMdSRDecentralized network level adaptive signal control by multi-agent deep reinforcement learningTransp Res Interdiscip Perspect20191100020 – reference: EckASohL-KDevlinSKudenkoDPotential-based reward shaping for finite horizon online pomdp planningAuton Agent Multi-Agent Syst201630340344510.1007/s10458-015-9292-6 – reference: CassanoLYuanKSayedAHMultiagent fully decentralized value function learning with linear convergence ratesIEEE Trans Auto Cont20216641497151242401830735220510.1109/TAC.2020.2995814https://doi.org/10.1109/TAC.2020.2995814 – reference: Wang L, Cai Q, Yang Z, Wang Z (2020b) Neural policy gradient methods: Global optimality and rates of convergence. In: International conference on learning representations. https://openreview.net/forum?id=BJgQfkSYDS. Accessed 02 July 2020 – reference: GuannanQNaLHarnessing smoothness to accelerate distributed optimizationIEEE Trans Cont Netw Syst20175312451260386100907044988 – reference: Zhang H, Feng S, Liu C, Ding Y, Zhu Y, Zhou Z, Zhang W, Yong Y u, Jin H, Li Z (2019) Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. In: The world wide web conference. ACM, pp 3620–3624 – reference: WagnerGChosetHSubdimensional expansion for multirobot path planningArtif Intell201521912432937501328.6823610.1016/j.artint.2014.11.001 – reference: KeZHeFZhangZXiLLiMMulti-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approachTransp Res C: Emerg Technol202012110286110.1016/j.trc.2020.102861 – reference: David SHuangAMaddisonCJGuezASifreLDen DriesscheGVSchrittwieserJAntonoglouIPanneershelvamVLanctotMMastering the game of go with deep neural networks and tree searchNature2016529758748410.1038/nature16961 – reference: WangBLiuZLiQProrokAMobile robot path planning in dynamic environments through globally guided reinforcement learningIEEE Robotics and Automation Letters2020546932693910.1109/LRA.2020.3026638 – reference: Ying B, Yuan K, Sayed AH (2018) 2018 Convergence of variance-reduced learning under random reshuffling. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2286–2290 – reference: Tang H, Hao J, Lv T, Chen Y, Zhang Z, Jia H, Ren CYZ, Fan C, Wang L (2018) Hierarchical deep multiagent reinforcement learning. arXiv:1809.09332 – reference: Zawadzki E, Lipson A, Leyton-Brown K (2014) Empirically evaluating multiagent learning algorithms. arXiv:1401.8074 – reference: Stanković M, Stanković S (2016) Multi-agent temporal-difference learning with linear function approximation: weak convergence under time-varying network topologies. In: 2016 American control conference (ACC), pp 167–172. https://doi.org/10.1109/ACC.2016.7524910 – reference: Zhang K, Yang Z (2018b) Tamer Basar Networked multi-agent reinforcement learning in continuous spaces. In: 2018 IEEE Conference on decision and control (CDC), IEEE. pp 2771–2776 – reference: Zhang K, Yang Z, Liu H, Zhang T, Basar T (2018c) Fully decentralized multi-agent reinforcement learning with networked agents. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. of proceedings of machine learning research. PMLR, 10–15 Jul, Stockholmsmassan, Stockholm Sweden pp 5872–5881 – reference: Terry JK, Black BJ, Grammel N, Jayakumar M, Hari A, Sullivan R, Santos L, Dieffendahl C, Horsch C, Perez-Vicente RDL, Williams NL, Lokesh Y, Ravi P (2021) Pettingzoo: gym for multi-agent reinforcement learning. In: Beygelzimer A, Dauphin Y, Liang P, Wortman Vaughan J (eds) Advances in neural information processing systems. https://openreview.net/forum?id=fLnsj7fpbPI. Accessed 17 March 2022 – reference: Jinyoung C, Beom-Jin L, Byoung-Tak Z (2017) Multi-focus attention network for efficient deep reinforcement learning. In: Workshops at the thirty-first AAAI conference on artificial intelligence – reference: Kar S, Moura MFJ, Poor HV (2013b) Distributed reinforcement learning in multi-agent networks. In: 2013 5th IEEE international workshop on computational advances in multi-sensor adaptive processing (CAMSAP). IEEE, pp 296–299 – reference: Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation – reference: Cáp M, Novák P, Seleckỳ M, Faigl J, Jiff V. (2013) Asynchronous decentralized prioritized planning for coordination in multi-robot system. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3822–3829 – reference: Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 – reference: Charles B, Joel Z L, Denis T, Tom W, Marcus W, Heinrich K, Andrew L, Simon G, Víctor V, Amir S et al (2016) Deepmind lab. arXiv:1612.03801 – reference: Foerster J, Assael IA, Freitas Nando de, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. Adv Neural Inf Process Syst:2137–2145 – reference: Shuo J (2019) Multi-Agent Reinforcement Learning Environment. https://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environmenthttps://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environment Accessed 2019-07-28 – reference: LaValle SM (2006) Planning algorithms. Cambridge university press – reference: Wei H, Nan X u, Zhang H, Zheng G, Zang X, Chen C, Zhang W, Zhu Y, Xu K, Li Z (2019b) Colight: Learning network-level cooperation for traffic signal control. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1913–1922 – reference: Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature (7540):529–533 – reference: Moerland TM, Broekens J, Jonker CM (2020) Model-based reinforcement learninga survey. arXiv:2006.16712 – reference: DingZHuangTZongqingLuLearning individually inferred communication for multi-agent cooperationAdv Neural Inf Process Syst2020332206922079 – reference: HochreiterSSchmidhuberJLong short-term memoryNeural comput1997981735178010.1162/neco.1997.9.8.1735 – reference: Leibo JZ, Zambaldi V, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, pp 464?473 – reference: Adrian K A, Kagan T (2004) Unifying temporal and structural credit assignment problems – reference: ML2 (2021) Marlenv, multi-agent reinforcement learning environment. http://github.com/kc-ml2/marlenv. Accessed 12 March 2020 – reference: Smierzchalski R, Michalewicz Z (2005) Path planning in dynamic environments. In: Innovations in robot mobility and control. Springer, pp 135–153 – reference: Lowe R, Yi W, Tamar A, Harb J, Abbeel OpenAI P., Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6382–6393 – reference: WangHWangXHuXZhangXGuMA multi-agent reinforcement learning approach to dynamic service compositionInf Sci20163639611910.1016/j.ins.2016.05.002 – reference: Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations – reference: OroojlooyjadidANazariMSnyderLTakáčMA deep q-network for the beer game: deep reinforcement learning for inventory optimizationManuf Serv Oper Manag201700null, 0https://doi.org/10.1287/msom.2020.0939. – reference: KroeseDPRubinsteinRYMonte carlo methodsWiley Interdiscip Rev Comput Stat201241485810.1002/wics.194 – reference: Leroy S, Laumond J-P, Siméon T (1999) Multiple path coordination for mobile robots A geometric algorithm. In: IJCAI, vol 99 pp 1118–1123 – reference: Wang J, Xu W, Gu Y, Song W, Green TC (2021) Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Beygelzimer A, Dauphin y , Liang P, Vaughan JW (eds) Advances in neural information processing systems. https://openreview.net/forum?id=hwoK62_GkiT. Accessed 23 Jan 2022 – reference: Freed B, Sartoretti G, Jiaheng H, Choset H (2020) Communication learning via backpropagation in discrete channels with unknown noise. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 7160–7168 – reference: BowlingMVelosoMMultiagent learning using a variable learning rateArtif Intell2002136221525018958190995.6807510.1016/S0004-3702(02)00121-2 – reference: Sun W, Jiang N, Krishnamurthy A, Agarwal A, Langford J (2019) Model-based rl in contextual decision processes: pac bounds and exponential improvements over model-free approaches. In: Conference on learning theory. PMLR, pp 2898–2933 – reference: Usunier N, Synnaeve G, Lin Z, Chintala S (2017) Episodic exploration for deep deterministic policies for starcraft micromanagement. In: International conference on learning representations. https://openreview.net/forum?id=r1LXit5ee. Accessed 28 July 2019 – reference: LussangeJLazarevichIBourgeois-GirondeSPalminteriSGutkinBModelling stock markets by multi-agent reinforcement learningComput Econ202157111314710.1007/s10614-020-10038-w – reference: Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TGJ, Hung Ch-M, Torr PHS, Foerster J, Whiteson S (2019a) The StarCraft multi-agent challenge. arXiv:1902.04043 – reference: David S, Guy L, Nicolas H, Thomas D, Daan W, Martin R (2014) Deterministic policy gradient algorithms. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, volume 32 of proceedings of machine learning research. PMLR, Bejing, 22–24 Jun, pages 387–395, http://proceedings.mlr.press/v32/silver14.html. Accessed 28 July 2019 – reference: SethAShermanMReinboltJADelpSLOpensim: a musculoskeletal modeling and simulation framework for in silico investigations and exchangeProcedia Iutam2011221223210.1016/j.piutam.2011.04.021 – reference: KoberJAndrew BagnellJJan PReinforcement learning in robotics a surveyInt J Robot Res201332111238127410.1177/0278364913495721 – reference: SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature2017550767635410.1038/nature24270 – reference: LiuYChenYJiangTDynamic selective maintenance optimization for multi-state systems over a finite horizon: a deep reinforcement learning approachEur J Oper Res2020283116618140499851431.9005310.1016/j.ejor.2019.10.049 – reference: Marc B, Peng W (2019) Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. In: Reinforcement learning for real life workshop in the 36th international conference on machine learning, long beach – reference: Suarez J, Du Y, Isola P, Mordatch I (2019) Neural mmo: a massively multiagent game environment for training and evaluating intelligent agents. arXiv:1903.00784 – reference: ZhangKKoppelAZhuHBasarTGlobal convergence of policy gradient methods to (almost) locally optimal policiesSIAM J Control Optim20205863586361241829001451.9337910.1137/19M1288012 – reference: Petersen K (2012) Termes: an autonomous robotic system for three-dimensional collective construction. Robot: Sci Syst VII, pp 257 – reference: MatignonLLaurentGJFort-PiatNLIndependent reinforcement learners in cooperative markov games: a survey regarding coordination problemsKnowl Eng Rev20122113110.1017/S0269888912000057 – reference: SuttonRSRupam MahmoodAWhiteMAn emphatic approach to the problem of off-policy temporal-difference learningJ Mach Learn Res20161712603263135170961360.68712 – reference: Eric J, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax. In: ICLR – reference: Norén JFW (2020) Derk gym environment. https://gym.derkgame.com. Accessed 01 Sept 2021 – reference: SartorettiGKerrJShiYWagnerGKumarTKSKoenigSChosetHPrimal: pathfinding via reinforcement and imitation multi-agent learningIEEE Robot Auto Lett2019432378238510.1109/LRA.2019.2903261 – reference: Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063 – reference: Zhang S, Sutton RS (2017) A deeper look at experience replay. arXiv:1712.01275 – reference: AndriotisCPPapakonstantinouKGManaging engineering systems with large state and action spaces through deep reinforcement learningReliab Eng Syst201919110648310.1016/j.ress.2019.04.036 – reference: Wei H, Chen C, Zheng G, Kan W, Gayah V, Xu K, Li Z (2019a) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, KDD ’19, pp 1290–1298 – reference: Guillaume S, Yue W, William P, TK SK, Sven K, Howie C (2019b) Distributed reinforcement learning for multi-robot decentralized collective construction. In: Distributed Autonomous Robotic Systems. Springer, pp 35–49 – reference: StonePVelosoMMultiagent systems: a survey from a machine learning perspectiveAuton Robot20008334538310.1023/A:1008942012299 – reference: Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Gonzalez J, Goldberg K, Stoica I (2017) Ray RLlib: a composable and scalable reinforcement learning library. In: Deep reinforcement learning symposium (DeepRL @ NeurIPS) – reference: GarcıaJFernándezFA comprehensive survey on safe reinforcement learningJ Mach Learn Res20151611437148034177871351.68209 – reference: Greg B, Vicki C, Ludwig P, Jonas S, John S, Jie T, Wojciech Zaremba (2016) Openai gym – reference: Kulkarni TD, Narasimhan K, Saeedi A, Josh T. (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pages 3675–3683 – reference: Ng AY, Harada D, Russell SJ (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the sixteenth international conference on machine learning, ISBN 1-55860-612-2. ICML ’99, pp 278–287. http://dl.acm.org/citation.cfm?id=645528.657613. Morgan Kaufmann Publishers Inc., CA. Accessed 28 July 2019. – reference: Mao H, Zhang Z, Xiao Z, Gong Z (2019) Modelling the dynamic joint policy of teammates with attention multi-agent ddpg. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, pp 1108–1116 – reference: Makar R, Mahadevan S, Ghavamzadeh M (2001) Hierarchical multi-agent reinforcement learning. In: Proceedings of the fifth international conference on autonomous agents. ACM, pp 246–253 – reference: PrashanthLABhatnagarSReinforcement learning with function approximation for traffic signal controlIEEE Trans Intell Transp Syst2010122412421 – reference: Yu H (2015) On convergence of emphatic temporal-difference learning. In: Conference on learning theory, pp 1724–1751 – reference: PadullaparthiVRNagarathinamSVasanAMenonVSudarsanamDFalcon-farm level control for wind turbines using multi-agent deep reinforcement learningRenew Energy202218144545610.1016/j.renene.2021.09.023 – reference: Ian X (2018) A distributed reinforcement learning solution with knowledge transfer capability for a bike rebalancing problem. arXiv:1810.04058 – reference: Zuo X (2018) Mazelab: a customizable framework to create maze and gridworld environments. https://github.com/zuoxingdong/mazelab. Accessed 28 July 2019 – reference: SongYWojcickiALukasiewiczTWangJAryanAXuZXuMDingZWuLArena: a general evaluation platform and building toolkit for multi-agent intelligenceProc AAAI Conf Artif Intell2020340572537260https://doi.org/10.1609/aaai.v34i05.6216. https://ojs.aaai.org/index.php/AAAI/article/view/6216 – reference: Sorensen J, Mikkelsen R, Henningson D, Ivanell S, Sarmast S, Andersen S (2015) Simulation of wind turbine wakes using the actuator line technique. Philosophical Trans Series Math Phys Eng Sci. vol 373(02). https://doi.org/10.1098/rsta.20140071 – reference: Monireh A, Nasser M, Ana LC B (2011) Traffic light control in non-stationary environments based on multi agent q-learning. In: 2011 14th international IEEE conference on intelligent transportation systems (ITSC), pp 1580–1585. https://doi.org/10.1109/ITSC.2011.6083114 – reference: Bouton M, Farooq H, Forgeat J, Bothe S, Shirazipour M, Karlsson P (2021) Coordinated reinforcement learning for optimizing mobile networks, arXiv:2109.15175 – reference: Mohanty S, Nygren E, Laurent F, Schneider M, Scheller C, Bhattacharya N, Watson J, Egli A, Eichenberger C, Baumberger C et al (2020) Flatland-rl: Multi-agent reinforcement learning on trains. arXiv:2012.05893 – reference: TampuuAMatiisenTKodeljaDKuzovkinIKorjusKAruJAruJVicenteRMultiagent cooperation and competition with deep reinforcement learningPlos one2017124e017239510.1371/journal.pone.0172395 – reference: Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 39–48 – reference: BorkarVSMeynSPThe o.d.e. method for convergence of stochastic approximation and reinforcement learningSIAM J Control Optim200038244746917411480990.6207110.1137/S0363012997331639 – reference: Weiß G (1995) Distributed reinforcement learning. In: Luc Steels (ed) The Biology and technology of intelligent autonomous agents, pp 415–428. Berlin, Heidelberg. Springer Berlin Heidelberg – reference: Panerati J, Zheng H, Zhou SQ, Xu J, Prorok A, Schoellig AP (2021) Learning to fly–a gym environment with pybullet physics for reinforcement learning of multiagent quadcopter control. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 7512– 7519 – reference: Ma H, Harabor D, Stuckey PJ, Li J, Koenig S (2019) Searching with consistent prioritization for multi-agent path finding. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7643–7650 – reference: Hestenes MR, Stiefel E et al (1952) Methods of conjugate gradients for solving linear systems. NBS Washington, DC, vol 49 – reference: Fuji T, Ito K, Matsumoto K, Yano K (2018) Deep multi-agent reinforcement learning using dnn-weight evolution to optimize supply chain performance. In: Proceedings of the 51st Hawaii international conference on system sciences, vol 8 – reference: Max B, Guni S, Roni S, Ariel F (2014) Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Seventh annual symposium on combinatorial search. Citeseer – reference: Lipton ZC, Gao J, Li L, Li X, Ahmed F, Li D (2016) Efficient exploration for dialog policy learning with deep bbq networks & replay buffer spiking. coRR abs/1608.05081 – reference: WangXWangHQiCMulti-agent reinforcement learning based maintenance policy for a resource constrained flow line systemJ Intell Manuf201627232533310.1007/s10845-013-0864-5 – reference: Berg JVD, Guy SJ, Lin M, Manocha D (2011) Reciprocal n-body collision avoidance. In: Robotics research. Springer pp 3–19 – reference: Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR. org, pp 2681–2690 – reference: BellemareMGNaddafYVenessJBowlingMThe arcade learning environment: an evaluation platform for general agentsJ Artif Intell Res20134725327910.1613/jair.3912 – reference: YousefiNTsianikasSCoitDWReinforcement learning for dynamic condition-based maintenance of a system with individually repairable componentsQual Eng202032338840810.1080/08982112.2020.1766692 – reference: Macua SV, Tukiainen A, Hernández DG-O, Baldazo D, de Cote EM, Zazo S (2018) Diff-dac: Distributed actor-critic for average multitask deep reinforcement learning. In: Adaptive learning agents (ALA) conference – reference: LorenzoPDScutari GNext: in-network nonconvex optimizationIEEE Trans Signal Inf Process Over Netw201622120136355596210.1109/TSIPN.2016.2524588 – reference: Arthur J, Berges V-P, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627 – reference: Donghwan L, Hyung-Jin Y, Naira H (2018) Primal-dual algorithm for distributed reinforcement learning: distributed GTD. 2018 IEEE Conf Decis Control (CDC):1967–1972 – reference: Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, Yi Y (2019) Learning to schedule communication in multi-agent reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=SJxu5iR9KQ. Accessed 07 March 2020 – reference: Mordatch I, Abbeel P (2018a) Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 – reference: Sergio ValcarcelMJianshu CSantiago ZAli H SDistributed policy evaluation under multiple behavior strategiesIEEE Trans Auto Cont20156051260127433514101360.6871410.1109/TAC.2014.2368731https://doi.org/10.1109/TAC.2014.2368731 – reference: Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in neural information processing systems, pp 4190– 4203 – reference: ZhangHJiangHLuoYXiaoGData-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning methodIEEE Trans Ind Electron20176454091410010.1109/TIE.2016.2542134https://doi.org/10.1109/TIE.2016.2542134 – reference: WilliamsRSimple statistical gradient-following algorithms for connectionist reinforcement learningMach Learn199283-42292560772.6807610.1007/BF00992696 – reference: SilvaFLDCostaAHRA survey on transfer learning for multiagent reinforcement learning systemsJ Artif Intell Res20196464570339325591489.6822110.1613/jair.1.11396 – reference: Pan L, Cai Q, Meng Q, Chen W, Huang L (2020) Reinforcement learning with dynamic boltzmann softmax updates. In: Christian Bessiere (ed) Proceedings of the twenty-ninth international joint conference on artificial intelligence. International joint conferences on artificial intelligence organization, IJCAI-20, Main track, pp 1992–1998. https://doi.org/10.24963/ijcai.2020/276 – reference: Jan B, Steven Morad JG, Qingbiao L, Amanda P (2021) A framework for real-world multi-robot systems running decentralized gnn-based policies. arXiv:2111.01777 – reference: Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning. Citeseer – reference: Raghuram Bharadwaj D, D Sai Koti R, Prabuchandran KJ, Shalabh B. (2019) Actor-critic algorithms for constrained multi-agent reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. Richland, SC, AAMAS?19. International foundation for autonomous agents and multiagent systems, pp 1931–1933 – reference: Son K, Kim D, Kang WJ, Hostallero ED, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 31st international conference on machine learning, proceedings of machine learning research. PMLR – reference: William F, Prajit R, Rishabh A, Yoshua B, Hugo L, Mark R, Will D (2020) Revisiting fundamentals of experience replay. In: International conference on machine learning. PMLR, pp 3061–3071 – reference: Zhang C, Lesser V, Shenoy P (2009) A multi-agent learning approach to online distributed resource allocation. In: Twenty-first international joint conference on artificial intelligence – reference: SchmidtMRouxNLBachFMinimizing finite sums with the stochastic average gradientMath Program20171621-28311236129331358.9007310.1007/s10107-016-1030-6 – reference: Lazaric A (2012) Transfer in reinforcement learning: a framework and a survey. In: Reinforcement learning. Springer, pp 143–173 – reference: SuttleWYangZZhangKWangZBaşarTLiuJA multi-agent off-policy actor-critic algorithm for distributed reinforcement learningIFAC-PapersOnLine. ISSN 2405-8963. 21th IFAC World Congress202053215491554https://doi.org/10.1016/j.ifacol.2020.12.2021. https://www.sciencedirect.com/science/article/pii/S2405896320326562 – reference: Sukhbaatar S, Szlam A, Synnaeve G, Chintala S, Fergus R (2015) Mazebase: a sandbox for learning from games. arXiv:1511.07401 – reference: Yang Z, Zhang K, Hong M (2018b) Tamer başar. A finite sample analysis of the actor-critic algorithm. In: IEEE Conference on decision and control (CDC). IEEE, pp 2759–2764 – reference: Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018a) Mean field multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. of Proceedings of machine learning research, pp 5571–5580. Stockholmsmassan, Stockholm Sweden, 10–15 Jul PMLR – reference: Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. Adv Neural Inf Process Syst. Deep learning workshop – reference: LinL-JSelf-improving reactive agents based on reinforcement learning, planning and teachingMach Learn199283-429332110.1007/BF00992699 – reference: Liu B, Cai Q, Yang Z, Wang Z (2019) Neural trust region/proximal policy optimization attains globally optimal policy. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems. Curran Associates Inc., volume 32. https://proceedings.neurips.cc/paper/2019/file/227e072d131ba77451d8f27ab9afdfb7-Paper.pdf. Accessed 12 Apr 2020 – reference: Sam D, Daniel K (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th international conference on autonomous agents and multiagent systems-volume 1. International foundation for autonomous agents and multiagent systems, pp 225–232 – reference: Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337 – reference: NguyenTTNguyenNDNahavandiSDeep reinforcement learning for multiagent systems: a review of challenges, solutions, and applicationsIEEE Trans Cybern20205093826383910.1109/TCYB.2020.2977374 – reference: PennesiPPaschalidisICA distributed actor-critic algorithm and applications to mobile sensor network coordination problemsIEEE Trans Auto Cont201055249249726044301368.9002610.1109/TAC.2009.2037462ISSN 0018-9286. https://doi.org/10.1109/TAC.2009.2037462 – reference: LeeJParkJJangminOLeeJHongEA multiagent approach to q-learning for daily stock tradingIEEE Trans Syst Man Cybern A: Syst Hum200737686487710.1109/TSMCA.2007.904825 – reference: Bei P, Tabish R, Christian Schroeder de W, Pierre-Alexandre K, Philip T, Wendelin B, Shimon W (2021) Facmac: Factored multi-agent centralised policy gradients. Adv Neural Inf Process Syst, vol 34 – reference: Wang RE, Everett M, How JP (2019b) R-maddpg for partially observable environments and limited communication. ICML 2019 Workshop RL4reallife – reference: Wang Y, Han B, Wang T, Dong H, Zhang C (2020c) Dop: Off-policy multi-agent decomposed policy gradients. In: International conference on learning representations – reference: FitouhiM-CNourelfathMGershwinSBPerformance evaluation of a two-machine line with a finite buffer and condition-based maintenanceReliab Eng Syst2017166617210.1016/j.ress.2017.03.034 – reference: Kyuree AJinkyoo PCooperative zone-based rebalancing of idle overhead hoist transportations using multi-agent reinforcement learning with graph representation learningIISE Trans202100117https://doi.org/10.1080/24725854.2020.1851823 https://doi.org/10.1080/24725854.2020.1851823 – reference: Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning. PMLR, pp 1861–1870 – reference: WatkinsCJDayanPQ-learningMach Learn199283-42792920773.6806210.1007/BF00992698 – reference: ChuTWangJCodecàLLiZMulti-agent deep reinforcement learning for large-scale traffic signal controlIEEE Trans Intell Transp Syst20192131086109510.1109/TITS.2019.2901791 – reference: Jakob NF, Gregory F, Triantafyllos A, Nantas N, Shimon W (2018) Counterfactual multi-agent policy gradients. In: Thirty-second AAAI conference on artificial intelligence – reference: Zhang C, Li X, Hao J, Chen S, Tuyls K, Xue W, Feng Z (2018a) Scc-rfmq learning in cooperative markov games with continuous actions. In: Proceedings of the 17th international Conference on Autonomous Agents and MultiAgent systems. International foundation for autonomous agents and Multiagent systems, pp 2162–2164 – reference: Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Elibolm M, Yang Z, Paul W, Jordan M et al (2018) Ray: a distributed framework for emerging fAIg applications. In: 13th {USENIX} symposium on operating systems design and implementation (fOSDIg 18), pp 561–577 – reference: Yuxi L (2017) Deep reinforcement learning: an overview. arXiv:1701.07274 – reference: Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 2085–2087. International foundation for autonomous agents and multiagent systems – reference: Wei H, Zheng G, Gayah V, Li Z (2019c) A survey on traffic signal control methods. arXiv:1904.08117 – reference: WangSWanJZhangDLiDZhangCTowards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordinationComput Netw201610115816810.1016/j.comnet.2015.12.017 – reference: CuiRBoGJiGPareto-optimal coordination of multiple robots with safety guaranteesAuton Robot201232318920510.1007/s10514-011-9265-9 – reference: Jiang S, Amato C (2021) Multi-agent reinforcement learning with directed exploration and selective memory reuse. In: Proceedings of the 36th annual ACM symposium on applied computing, pp 777–784 – reference: Prabuchandran KJ, Hemanth Kumar AN, Bhatnagar S (2014) Multi-agent reinforcement learning for traffic signal control. In: 17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, pp 2529–2534 – reference: Lanctot M, Lockhart E, Lespiau J-B, Zambaldi V, Upadhyay S, Pérolat J, Srinivasan S, Timbers F, Tuyls K, Omidshafiei S et al (2019) Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453 – reference: MirhoseiniAGoldieAYazganMJiangJSonghoriEWangSLeeY-JJohnsonEPathakONaziAA graph placement methodology for fast chip designNature2021594786220721210.1038/s41586-021-03544-w – reference: Gabel T, Riedmiller M (2007) On a successful application of multi-agent reinforcement learning to operations research benchmarks. In: 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning. IEEE, pp 68–75 – reference: Sanchez G, Latombe J-C (2002) Using a prm planner to compare centralized and decoupled planning for multi-robot systems. In: Proceedings 2002 IEEE international conference on robotics and automation (Cat. No. 02CH37292), vol 2, pp 2112–2119 – reference: Nazari M, Oroojlooy A, Snyder L, Takác M. (2018) Reinforcement learning for solving the vehicle routing problem – reference: Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential monte carlo methods. In: Advances in neural information processing systems, pp 833–840 – reference: Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games. IEEE, Santorini, The best paper award, pp 341–348 – reference: Ryu H, Shin H, Park J (2018) Multi-agent actor-critic with generative cooperative policy network. arXiv:1810.09206 – ident: 4105_CR241 – ident: 4105_CR178 – ident: 4105_CR212 – volume: 9 start-page: 1735 issue: 8 year: 1997 ident: 4105_CR72 publication-title: Neural comput doi: 10.1162/neco.1997.9.8.1735 – volume: 47 start-page: 253 year: 2013 ident: 4105_CR16 publication-title: J Artif Intell Res doi: 10.1613/jair.3912 – volume: 5 start-page: 1245 issue: 3 year: 2017 ident: 4105_CR161 publication-title: IEEE Trans Cont Netw Syst – ident: 4105_CR155 – volume: 8 start-page: 345 issue: 3 year: 2000 ident: 4105_CR192 publication-title: Auton Robot doi: 10.1023/A:1008942012299 – ident: 4105_CR132 – volume: 66 start-page: 1497 issue: 4 year: 2021 ident: 4105_CR32 publication-title: IEEE Trans Auto Cont doi: 10.1109/TAC.2020.2995814 – volume: 33 start-page: 10088 year: 2020 ident: 4105_CR163 publication-title: Adv Neural Inf Process Syst – volume: 30 start-page: 403 issue: 3 year: 2016 ident: 4105_CR53 publication-title: Auton Agent Multi-Agent Syst doi: 10.1007/s10458-015-9292-6 – volume: 27 start-page: 325 issue: 2 year: 2016 ident: 4105_CR228 publication-title: J Intell Manuf doi: 10.1007/s10845-013-0864-5 – ident: 4105_CR38 – ident: 4105_CR15 – volume: 8 start-page: 229 issue: 3-4 year: 1992 ident: 4105_CR236 publication-title: Mach Learn doi: 10.1007/BF00992696 – ident: 4105_CR73 – volume: 594 start-page: 207 issue: 7862 year: 2021 ident: 4105_CR133 publication-title: Nature doi: 10.1038/s41586-021-03544-w – ident: 4105_CR170 doi: 10.1007/978-3-030-05816-6_3 – volume: 34 start-page: 7253 issue: 05 year: 2020 ident: 4105_CR189 publication-title: Proc AAAI Conf Artif Intell doi: 10.1609/aaai.v34i05.6216 – volume: 2 start-page: 212 year: 2011 ident: 4105_CR177 publication-title: Procedia Iutam doi: 10.1016/j.piutam.2011.04.021 – volume: 61 start-page: 1848 issue: 7 year: 2013 ident: 4105_CR90 publication-title: IEEE Trans Signal Process doi: 10.1109/TSP.2013.2241057 – ident: 4105_CR67 – ident: 4105_CR187 doi: 10.1007/10992388_4 – ident: 4105_CR21 – ident: 4105_CR137 – volume: 363 start-page: 96 year: 2016 ident: 4105_CR222 publication-title: Inf Sci doi: 10.1016/j.ins.2016.05.002 – ident: 4105_CR195 – ident: 4105_CR229 – ident: 4105_CR10 – ident: 4105_CR60 doi: 10.24251/HICSS.2018.157 – ident: 4105_CR252 – ident: 4105_CR200 – volume: 4 start-page: 128 issue: 2 year: 2010 ident: 4105_CR8 publication-title: IET Intell Transp Syst doi: 10.1049/iet-its.2009.0070 – ident: 4105_CR56 – ident: 4105_CR140 doi: 10.1609/aaai.v32i1.11492 – volume: 33 start-page: 22069 year: 2020 ident: 4105_CR51 publication-title: Adv Neural Inf Process Syst – ident: 4105_CR74 – ident: 4105_CR218 – ident: 4105_CR91 doi: 10.1109/CAMSAP.2013.6714066 – volume: 55 start-page: 492 issue: 2 year: 2010 ident: 4105_CR157 publication-title: IEEE Trans Auto Cont doi: 10.1109/TAC.2009.2037462 – ident: 4105_CR68 – ident: 4105_CR80 – ident: 4105_CR224 – ident: 4105_CR254 doi: 10.1145/3308558.3314139 – ident: 4105_CR213 doi: 10.1007/978-3-642-19457-3_1 – ident: 4105_CR230 – ident: 4105_CR167 – ident: 4105_CR196 – volume: 283 start-page: 166 issue: 1 year: 2020 ident: 4105_CR120 publication-title: Eur J Oper Res doi: 10.1016/j.ejor.2019.10.049 – ident: 4105_CR148 – ident: 4105_CR216 doi: 10.1007/978-3-642-00644-9_33 – ident: 4105_CR31 doi: 10.1109/IROS.2013.6696903 – ident: 4105_CR108 – ident: 4105_CR156 – volume: 50 start-page: 3826 issue: 9 year: 2020 ident: 4105_CR147 publication-title: IEEE Trans Cybern doi: 10.1109/TCYB.2020.2977374 – ident: 4105_CR2 – ident: 4105_CR92 doi: 10.1109/SMCIA.2008.5045926 – ident: 4105_CR125 doi: 10.1609/aaai.v33i01.33017643 – ident: 4105_CR40 – ident: 4105_CR37 – volume: 38 start-page: 156 issue: 2 year: 2008 ident: 4105_CR29 publication-title: IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) doi: 10.1109/TSMCC.2007.913919 – ident: 4105_CR59 doi: 10.1609/aaai.v34i05.6205 – ident: 4105_CR79 – ident: 4105_CR103 – ident: 4105_CR162 – ident: 4105_CR3 – ident: 4105_CR190 doi: 10.1098/rsta.20140071 – volume: 58 start-page: 3586 issue: 6 year: 2020 ident: 4105_CR257 publication-title: SIAM J Control Optim doi: 10.1137/19M1288012 – ident: 4105_CR260 – ident: 4105_CR20 – ident: 4105_CR171 – ident: 4105_CR235 doi: 10.1007/978-3-642-79629-6_18 – ident: 4105_CR57 – volume: 0 start-page: 1 issue: 0 year: 2021 ident: 4105_CR4 publication-title: IISE Trans doi: 10.1080/24725854.2020.1851823 10.1080/24725854.2020.1851823 – ident: 4105_CR86 – ident: 4105_CR30 doi: 10.1007/978-3-642-14435-6_7 – ident: 4105_CR188 – ident: 4105_CR104 – ident: 4105_CR165 – ident: 4105_CR9 doi: 10.1016/j.artint.2021.103500 – ident: 4105_CR127 – ident: 4105_CR46 – ident: 4105_CR242 – ident: 4105_CR98 – volume: 160 start-page: 113701 year: 2020 ident: 4105_CR78 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2020.113701 – volume: 17 start-page: 2603 issue: 1 year: 2016 ident: 4105_CR205 publication-title: J Mach Learn Res – ident: 4105_CR154 – ident: 4105_CR25 – ident: 4105_CR116 – ident: 4105_CR1 doi: 10.1109/ITSC.2011.6083114 – ident: 4105_CR204 doi: 10.1145/1553374.1553501 – ident: 4105_CR124 doi: 10.1609/aaai.v30i1.10409 – ident: 4105_CR225 – ident: 4105_CR199 – ident: 4105_CR210 – volume: 15 start-page: 10026 issue: 5 year: 2015 ident: 4105_CR246 publication-title: Sensors doi: 10.3390/s150510026 – ident: 4105_CR62 doi: 10.1109/ADPRL.2007.368171 – ident: 4105_CR85 doi: 10.1145/3412841.3441953 – volume: 219 start-page: 1 year: 2015 ident: 4105_CR219 publication-title: Artif Intell doi: 10.1016/j.artint.2014.11.001 – ident: 4105_CR255 doi: 10.1109/CDC.2018.8619581 – ident: 4105_CR14 – ident: 4105_CR114 doi: 10.1145/3219819.3219993 – volume: 4 start-page: 2378 issue: 3 year: 2019 ident: 4105_CR169 publication-title: IEEE Robot Auto Lett doi: 10.1109/LRA.2019.2903261 – volume: 6 start-page: 5533 issue: 3 year: 2021 ident: 4105_CR109 publication-title: IEEE Robot Autom Lett doi: 10.1109/LRA.2021.3077863 – ident: 4105_CR243 – ident: 4105_CR220 – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 4105_CR183 publication-title: Nature doi: 10.1038/nature16961 – ident: 4105_CR97 – ident: 4105_CR138 – ident: 4105_CR198 doi: 10.1007/978-3-319-71679-4 – ident: 4105_CR71 doi: 10.6028/jres.049.044 – ident: 4105_CR172 – ident: 4105_CR41 – volume: 32 start-page: 1238 issue: 11 year: 2013 ident: 4105_CR95 publication-title: Int J Robot Res doi: 10.1177/0278364913495721 – ident: 4105_CR121 – ident: 4105_CR36 – volume: 57 start-page: 113 issue: 1 year: 2021 ident: 4105_CR122 publication-title: Comput Econ doi: 10.1007/s10614-020-10038-w – ident: 4105_CR77 doi: 10.1109/COASE.2019.8843338 – ident: 4105_CR128 doi: 10.1145/375735.376302 – volume: 280 start-page: 103216 year: 2020 ident: 4105_CR13 publication-title: Artif Intell doi: 10.1016/j.artint.2019.103216 – volume: 5 start-page: 6932 issue: 4 year: 2020 ident: 4105_CR221 publication-title: IEEE Robotics and Automation Letters doi: 10.1109/LRA.2020.3026638 – volume: 32 start-page: 9927 year: 2019 ident: 4105_CR174 publication-title: Adv Neural Inf Process Syst – ident: 4105_CR152 doi: 10.24963/ijcai.2020/276 – ident: 4105_CR193 doi: 10.1609/aaai.v35i13.17353 – volume: 191 start-page: 106483 year: 2019 ident: 4105_CR7 publication-title: Reliab Eng Syst doi: 10.1016/j.ress.2019.04.036 – ident: 4105_CR110 – volume: 192 start-page: 116323 year: 2022 ident: 4105_CR194 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2021.116323 – ident: 4105_CR75 – ident: 4105_CR69 – ident: 4105_CR81 – ident: 4105_CR209 – ident: 4105_CR58 – ident: 4105_CR149 – ident: 4105_CR166 – ident: 4105_CR237 – volume: 60 start-page: 4289 issue: 8 year: 2012 ident: 4105_CR34 publication-title: IEEE Trans Signal Process doi: 10.1109/TSP.2012.2198470 – ident: 4105_CR47 – volume: 101 start-page: 158 year: 2016 ident: 4105_CR227 publication-title: Comput Netw doi: 10.1016/j.comnet.2015.12.017 – ident: 4105_CR99 – ident: 4105_CR141 doi: 10.1609/aaai.v32i1.11492 – volume: 136 start-page: 215 issue: 2 year: 2002 ident: 4105_CR26 publication-title: Artif Intell doi: 10.1016/S0004-3702(02)00121-2 – ident: 4105_CR117 – ident: 4105_CR249 – volume: 58 start-page: 391 issue: 2 year: 2012 ident: 4105_CR22 publication-title: IEEE Trans Autom Control doi: 10.1109/TAC.2012.2209984 – ident: 4105_CR146 – ident: 4105_CR6 doi: 10.1109/CVPR.2016.12 – ident: 4105_CR82 – volume: 12 start-page: e0172395 issue: 4 year: 2017 ident: 4105_CR206 publication-title: Plos one doi: 10.1371/journal.pone.0172395 – ident: 4105_CR226 – ident: 4105_CR100 – ident: 4105_CR143 doi: 10.1109/IROS40897.2019.8968129 10.1109/IROS40897.2019.8968129 – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 4105_CR184 publication-title: Nature doi: 10.1038/nature24270 – ident: 4105_CR130 doi: 10.1109/IROS.2007.4399095 – ident: 4105_CR203 – ident: 4105_CR93 doi: 10.1109/CIG.2016.7860433 – ident: 4105_CR123 – volume-title: Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp 321–384 year: 2021 ident: 4105_CR258 doi: 10.1007/978-3-030-60990-0_12 – ident: 4105_CR191 doi: 10.1109/ACC.2016.7524910 – ident: 4105_CR215 – volume: 2 start-page: 120 issue: 2 year: 2016 ident: 4105_CR49 publication-title: IEEE Trans Signal Inf Process Over Netw doi: 10.1109/TSIPN.2016.2524588 – ident: 4105_CR94 – volume: 32 start-page: 189 issue: 3 year: 2012 ident: 4105_CR42 publication-title: Auton Robot doi: 10.1007/s10514-011-9265-9 – ident: 4105_CR175 – ident: 4105_CR112 – ident: 4105_CR245 doi: 10.1109/CDC.2018.8619440 – volume: 3 start-page: 8 issue: 1 year: 2018 ident: 4105_CR238 publication-title: CAAI Trans Intell Technol doi: 10.1049/trit.2018.0001 – ident: 4105_CR87 – volume: 64 start-page: 645 year: 2019 ident: 4105_CR43 publication-title: J Artif Intell Res doi: 10.1613/jair.1.11396 – volume: 8 start-page: 293 issue: 3-4 year: 1992 ident: 4105_CR115 publication-title: Mach Learn doi: 10.1007/BF00992699 – ident: 4105_CR164 – volume: 131 start-page: 148 year: 2019 ident: 4105_CR181 publication-title: Expert Syst Appl doi: 10.1016/j.eswa.2019.04.056 – ident: 4105_CR105 – ident: 4105_CR144 doi: 10.1109/IROS40897.2019.8968129 – volume: 53 start-page: 1549 issue: 2 year: 2020 ident: 4105_CR201 publication-title: IFAC-PapersOnLine. ISSN 2405-8963. 21th IFAC World Congress doi: 10.1016/j.ifacol.2020.12.2021 – volume: 219 start-page: 40 year: 2015 ident: 4105_CR179 publication-title: Artif Intell doi: 10.1016/j.artint.2014.11.006 – ident: 4105_CR182 – ident: 4105_CR111 – ident: 4105_CR5 – ident: 4105_CR101 doi: 10.1017/CBO9780511546877 – volume: 8 start-page: 279 issue: 3-4 year: 1992 ident: 4105_CR231 publication-title: Mach Learn doi: 10.1007/BF00992698 – volume: 162 start-page: 83 issue: 1-2 year: 2017 ident: 4105_CR173 publication-title: Math Program doi: 10.1007/s10107-016-1030-6 – ident: 4105_CR211 doi: 10.1109/IROS.2012.6386109 – ident: 4105_CR208 – ident: 4105_CR176 – ident: 4105_CR134 – ident: 4105_CR102 doi: 10.1007/978-3-642-27645-3_5 – ident: 4105_CR88 – volume-title: Neuro-Dynamic Programming year: 1996 ident: 4105_CR19 – ident: 4105_CR214 doi: 10.1609/aaai.v30i1.10295 – volume: 33 start-page: 750 issue: 6 year: 2019 ident: 4105_CR70 publication-title: Auton Agent Multi-Agent Syst doi: 10.1007/s10458-019-09421-1 – ident: 4105_CR48 – volume: 121 start-page: 102861 year: 2020 ident: 4105_CR259 publication-title: Transp Res C: Emerg Technol doi: 10.1016/j.trc.2020.102861 – ident: 4105_CR139 – ident: 4105_CR244 – ident: 4105_CR23 – volume: 60 start-page: 1260 issue: 5 year: 2015 ident: 4105_CR126 publication-title: IEEE Trans Auto Cont doi: 10.1109/TAC.2014.2368731 – volume: 1 start-page: 100020 year: 2019 ident: 4105_CR66 publication-title: Transp Res Interdiscip Perspect – ident: 4105_CR145 – ident: 4105_CR12 – ident: 4105_CR250 – ident: 4105_CR63 doi: 10.1145/3302509.3311053 – volume: 4 start-page: 48 issue: 1 year: 2012 ident: 4105_CR96 publication-title: Wiley Interdiscip Rev Comput Stat doi: 10.1002/wics.194 – ident: 4105_CR202 – ident: 4105_CR54 – volume: 0 start-page: null, 0 issue: 0 year: 2017 ident: 4105_CR150 publication-title: Manuf Serv Oper Manag doi: 10.1287/msom.2020.0939. – volume: 166 start-page: 61 year: 2017 ident: 4105_CR55 publication-title: Reliab Eng Syst doi: 10.1016/j.ress.2017.03.034 – ident: 4105_CR136 doi: 10.1038/nature14236 – volume: 21 start-page: 1086 issue: 3 year: 2019 ident: 4105_CR39 publication-title: IEEE Trans Intell Transp Syst doi: 10.1109/TITS.2019.2901791 – ident: 4105_CR28 – ident: 4105_CR113 – volume: 27 start-page: 819 issue: 4 year: 2002 ident: 4105_CR18 publication-title: Math Oper Res doi: 10.1287/moor.27.4.819.297 – ident: 4105_CR158 doi: 10.15607/RSS.2011.VII.035 – ident: 4105_CR11 – ident: 4105_CR251 – volume: 2 start-page: 1 issue: 1 year: 2012 ident: 4105_CR131 publication-title: Knowl Eng Rev doi: 10.1017/S0269888912000057 – volume: 16 start-page: 1437 issue: 1 year: 2015 ident: 4105_CR64 publication-title: J Mach Learn Res – volume: 37 start-page: 864 issue: 6 year: 2007 ident: 4105_CR106 publication-title: IEEE Trans Syst Man Cybern A: Syst Hum doi: 10.1109/TSMCA.2007.904825 – ident: 4105_CR17 doi: 10.1512/iumj.1957.6.56038 – volume: 69 start-page: 389 issue: 1 year: 2020 ident: 4105_CR52 publication-title: CIRP Ann doi: 10.1016/j.cirp.2020.04.005 – ident: 4105_CR142 – ident: 4105_CR180 – ident: 4105_CR247 doi: 10.1109/ICASSP.2018.8461739 – ident: 4105_CR261 doi: 10.1109/CDC40024.2019.9029969 – ident: 4105_CR35 doi: 10.1109/TR.2020.3044596 – ident: 4105_CR185 – volume: 50 start-page: 6918 issue: 1 year: 2017 ident: 4105_CR65 publication-title: IFAC-PapersOnLine doi: 10.1016/j.ifacol.2017.08.1217 – ident: 4105_CR33 doi: 10.1609/aaai.v34i04.5744 – volume: 32 start-page: 388 issue: 3 year: 2020 ident: 4105_CR248 publication-title: Qual Eng doi: 10.1080/08982112.2020.1766692 – ident: 4105_CR263 – ident: 4105_CR262 doi: 10.1145/3357384.3357900 – ident: 4105_CR83 – ident: 4105_CR168 – ident: 4105_CR256 – ident: 4105_CR233 doi: 10.1145/3357384.3357902 – ident: 4105_CR61 – ident: 4105_CR84 – volume: 181 start-page: 445 year: 2022 ident: 4105_CR151 publication-title: Renew Energy doi: 10.1016/j.renene.2021.09.023 – ident: 4105_CR186 – ident: 4105_CR234 – ident: 4105_CR129 – ident: 4105_CR240 – ident: 4105_CR232 doi: 10.1145/3292500.3330949 – ident: 4105_CR217 – volume: 38 start-page: 447 issue: 2 year: 2000 ident: 4105_CR24 publication-title: SIAM J Control Optim doi: 10.1137/S0363012997331639 – ident: 4105_CR76 doi: 10.1109/CVPR.2017.243 – ident: 4105_CR223 – volume: 64 start-page: 4091 issue: 5 year: 2017 ident: 4105_CR253 publication-title: IEEE Trans Ind Electron doi: 10.1109/TIE.2016.2542134 – ident: 4105_CR27 – ident: 4105_CR118 – ident: 4105_CR135 – volume: 12 start-page: 412 issue: 2 year: 2010 ident: 4105_CR160 publication-title: IEEE Trans Intell Transp Syst – ident: 4105_CR119 doi: 10.1109/ICHI.2017.45 – ident: 4105_CR197 – volume: 92 start-page: 251 issue: 1 year: 2017 ident: 4105_CR44 publication-title: Wirel Pers Commun doi: 10.1007/s11277-016-3849-9 – ident: 4105_CR45 doi: 10.1109/ICCV.2017.321 – ident: 4105_CR89 – volume: 27 start-page: 430 issue: 5 year: 2011 ident: 4105_CR239 publication-title: Futur Gener Comput Syst doi: 10.1016/j.future.2010.10.009 – ident: 4105_CR207 doi: 10.1016/B978-1-55860-307-3.50049-6 – ident: 4105_CR159 doi: 10.1109/ITSC.2014.6958095 – ident: 4105_CR50 – ident: 4105_CR107 – ident: 4105_CR153 doi: 10.1109/IROS51168.2021.9635857
SSID	ssj0003301
Score	2.6573231
Snippet	Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	13677
SubjectTerms	Algorithms Artificial Intelligence Computer Science Deep learning Machine learning Machines Manufacturing Mechanical Engineering Multiagent systems Processes
SummonAdditionalLinks	– databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwED5Bu7DwRhQK8sAGFkkdJ85UFdSqYqgQolK3yI8LC2oCLUP_PXbiNAIJ5jge7nwv-7v7AG7CJEkCnadU8EjSKJSMilAnNFDCMBmjKxIc2mIWT-fR04Iv_IXbysMqG59YOWpTaHdHfm9LAzdVMhZsWH5QxxrlXlc9hcYudK0LFqID3Yfx7Pll64tttV5x5tkqg8ZxuvBtM755LnJwIVuMBQ7rSDc_Q1Obb_56Iq0iz-QQ9n3KSEa1jo9gB5fHcNDQMRBvnScwHJG6EYUUOdFFUWI91ZtUoEEqXRMVMYilXVbNS9XV1SDxxBFvpzCfjF8fp9TzI1BtDWdNQ6OUcCE7MDznaHJkiY7QWmQaoki5McKIZCAZhlJEOUODViM2w5IitQdIsjPoLIslngNRJufMmqLQIo24LcESFZtYBUrnOOBa9SBsRJNpPzzccVi8Z-3YYyfOzIozq8SZbXpwu_2nrEdn_Lu630g882a0ylql9-Cu0UL7-e_dLv7f7RL2HG18DfnqQ2f9-YVXNrlYq2t_gr4BoR3KzA priority: 102 providerName: ProQuest
Title	A review of cooperative multi-agent deep reinforcement learning
URI	https://link.springer.com/article/10.1007/s10489-022-04105-y https://www.proquest.com/docview/2821175683
Volume	53
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5Bu7DwRhRK5YENLCV1nDgTSlEfAqlCiEplivwKC2oqWob-e2zHaQEBEpOH2B7ufL774u_uAC7DJEkCWaSY0YjjKOQEs1AmOBBMER5rCxIs22IcjybR3ZROfVLYoma710-S7qb-lOwWWXqPAU-B5Sbi1TY0qcHulsg16Wbr-9cgdNcnzyALHMfp1KfK_LzHV3e0iTG_PYs6bzPYh10fJqKs0usBbOnZIezVLRiQt8gjuMlQlXyCygLJspzrqpI3ckRBzG3iFFJaz800VyNVut-ByDeLeDmGyaD_dDvCvicClsZYljhUQjDrpgNFC6pVoUkiI22sMA01S6lSTLGky4kOOYsKopU2WjBRFWepOTScnEBjVs70KSChCkqM-THJ0oga2JWIWMUiELLQXSpFC8JaNLn0BcNt34rXfFPq2IozN-LMnTjzVQuu1mvmVbmMP2e3a4nn3nQWucGAtnxozEgLrmstbD7_vtvZ_6afw45tHV_RvtrQWL696wsTYCxFB7bZYNiBZjbo9cZ2HD7f983Y648fHjvutH0AnEXMXw
linkProvider	Springer Nature
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTwIxEJ4gHvTi24ii9qAnbdyl--geDCEqgiAnSLit28d6MSwKxvCn_I1O9wHRRG6ct9uk06_zaGfmA7iwfd-3ZBxQ7joRdeyIUW5Ln1qCKxZ52gQJJtui57UGztPQHZbgu6iFMWmVhU5MFbVKpLkjv8HQwHSV9Dirj9-pYY0yr6sFhUYGi46efWHINrlt3-P-XtZqzYf-XYvmrAJUItym1FZCcGPoLOXGrlaxZr50NOI4sDUPXKW44n4tYtqOuBMzrTSuA_2SiAco9ojhvGuw7jC05KYyvfk41_yMpXTLFsY01POCYV6kk5fqOSY5CUM_y2RW0tlvQ7jwbv88yKZ2rrkDW7mDShoZonahpEd7sF2QP5BcF-xDvUGysheSxEQmyVhnPcRJmqJII1OyRZTWYxyWdmeV6UUkyWkqXg9gsBK5HUJ5lIz0ERChYpfhweeSB46LAZ8vPOUJS8hY11wpKmAXogll3qrcMGa8hYsmy0acIYozTMUZzipwNf9nnDXqWDq6Wkg8zA_tJFxArALXxS4sPv8_2_Hy2c5ho9V_7obddq9zApuGsD5LNqtCefrxqU_RrZmKsxRLBF5WDd4fwNYHYw
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB5qBfHiW6zPPehJlybdPDYHEbUtrUoRsdBbzD7iRZpqK9K_5q9zNtlYFPTWczYDmf0yj92Z-QCO3TAMHZlGlPteQj03YZS7MqSO4IolgTZJgqm26AWdvncz8AcV-Cx7YUxZZWkTc0OtMmnOyOuYGpipkgFn9dSWRdw32xejV2oYpMxNa0mnUUDkVk8_MH0bn3ebuNcnjUa79XjdoZZhgEqE3oS6SghunJ6j_NTXKtUslJ5GTEeu5pGvFFc8bCRMuwn3UqaVxm_CGCXhEW5BwlDuAiyGJiuqwuJVq3f_8O0HGMvJlx3McGgQRAPbsmMb9zxTqoSJoGPqLOn0p1ucxbq_rmdzr9degxUbrpLLAl_rUNHDDVgtqSCItQybcHFJiiYYkqVEZtlIFxPFSV6wSBPTwEWU1iNcls9qlfmxJLGkFc9b0J-L5rahOsyGegeIUKnP0AxwySPPx_QvFIEKhCNkqhu-FDVwS9XE0g4uN_wZL_Fs5LJRZ4zqjHN1xtManH6_MyrGdvy7er_UeGx_4XE8A1wNzspdmD3-W9ru_9KOYAmBG991e7d7sGzY64vKs32oTt7e9QHGOBNxaMFE4Gne-P0CZwkM9Q
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+review+of+cooperative+multi-agent+deep+reinforcement+learning&rft.jtitle=Applied+intelligence+%28Dordrecht%2C+Netherlands%29&rft.au=Oroojlooy%2C+Afshin&rft.au=Hajinezhad%2C+Davood&rft.date=2023-06-01&rft.pub=Springer+US&rft.issn=0924-669X&rft.eissn=1573-7497&rft.volume=53&rft.issue=11&rft.spage=13677&rft.epage=13722&rft_id=info:doi/10.1007%2Fs10489-022-04105-y&rft.externalDocID=10_1007_s10489_022_04105_y
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0924-669X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0924-669X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0924-669X&client=summon