A review of cooperative multi-agent deep reinforcement learning

Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. Our classification of MARL approaches includes five categories for...

Full description

Saved in:
Bibliographic Details
Published inApplied intelligence (Dordrecht, Netherlands) Vol. 53; no. 11; pp. 13677 - 13722
Main Authors Oroojlooy, Afshin, Hajinezhad, Davood
Format Journal Article
LanguageEnglish
Published New York Springer US 01.06.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. Our classification of MARL approaches includes five categories for modeling and solving cooperative multi-agent reinforcement learning problems: (I) independent learners, (II) fully observable critics, (III) value function factorization, (IV) consensus, and (IV) learn to communicate. We first discuss each of these methods, their potential challenges, and how these challenges were mitigated in the relevant papers. Additionally, we make connections among different papers in each category if applicable. Next, we cover some new emerging research areas in MARL along with the relevant recent papers. In light of MARL’s recent success in real-world applications, we have dedicated a section to reviewing these applications and articles. This survey also provides a list of available environments for MARL research. Finally, the paper is concluded with proposals on possible research directions.
AbstractList Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of recent approaches on Multi-Agent Reinforcement Learning (MARL) algorithms. Our classification of MARL approaches includes five categories for modeling and solving cooperative multi-agent reinforcement learning problems: (I) independent learners, (II) fully observable critics, (III) value function factorization, (IV) consensus, and (IV) learn to communicate. We first discuss each of these methods, their potential challenges, and how these challenges were mitigated in the relevant papers. Additionally, we make connections among different papers in each category if applicable. Next, we cover some new emerging research areas in MARL along with the relevant recent papers. In light of MARL’s recent success in real-world applications, we have dedicated a section to reviewing these applications and articles. This survey also provides a list of available environments for MARL research. Finally, the paper is concluded with proposals on possible research directions.
Author Oroojlooy, Afshin
Hajinezhad, Davood
Author_xml – sequence: 1
  givenname: Afshin
  orcidid: 0000-0001-7829-6145
  surname: Oroojlooy
  fullname: Oroojlooy, Afshin
  email: oroojlooy@gmail.com
  organization: SAS Institute Inc
– sequence: 2
  givenname: Davood
  surname: Hajinezhad
  fullname: Hajinezhad, Davood
  organization: SAS Institute Inc
BookMark eNp9kE9LAzEQxYNUsK1-AU8LnqPJZtMkJynFf1DwouAtpJvZkrJN1mRb6bc3dQXBQ0_DDO_3ZuZN0MgHDwhdU3JLCRF3iZJKKkzKEpOKEo4PZ2hMuWBYVEqM0JiossKzmfq4QJOUNoQQxggdo_t5EWHv4KsITVGH0EE0vdtDsd21vcNmDb4vLECXZc43IdawPY5aMNE7v75E541pE1z91il6f3x4Wzzj5evTy2K-xDWjqsfUrlaypJwSyxsOtgEm6gqkJIqCVNxaaaUoDQNqZNUwsCB5xWnuFBhr2BTdDL5dDJ87SL3ehF30eaUuszEVfCZZVslBVceQUoRG167P_wTfR-NaTYk-xqWHuHSOS__EpQ8ZLf-hXXRbEw-nITZAKYv9GuLfVSeob_tZgEU
CitedBy_id crossref_primary_10_3390_en16041608
crossref_primary_10_3390_electronics12132814
crossref_primary_10_32604_cmes_2024_050986
crossref_primary_10_1016_j_eswa_2025_126604
crossref_primary_10_1016_j_engappai_2023_107697
crossref_primary_10_1109_JSAC_2023_3336156
crossref_primary_10_1109_TNSM_2023_3267809
crossref_primary_10_1007_s11432_023_3853_y
crossref_primary_10_1088_1361_6560_ac9cb3
crossref_primary_10_3389_frobt_2024_1336612
crossref_primary_10_1016_j_eswa_2025_126565
crossref_primary_10_1177_03611981241297977
crossref_primary_10_1016_j_eswa_2025_127256
crossref_primary_10_3390_s24082461
crossref_primary_10_1080_00207543_2024_2411615
crossref_primary_10_3389_frobt_2023_1089062
crossref_primary_10_1007_s00607_024_01380_0
crossref_primary_10_1016_j_chaos_2023_114032
crossref_primary_10_30657_pea_2025_31_11
crossref_primary_10_1007_s10462_025_11166_1
crossref_primary_10_53941_ijamm_2023_100018
crossref_primary_10_1109_TAI_2024_3415550
crossref_primary_10_1007_s11042_024_19951_w
crossref_primary_10_1038_s41467_024_51887_5
crossref_primary_10_1016_j_oceaneng_2024_120243
crossref_primary_10_1016_j_jai_2024_02_003
crossref_primary_10_1016_j_oceaneng_2024_120123
crossref_primary_10_1109_ACCESS_2024_3384923
crossref_primary_10_1016_j_knosys_2024_112665
crossref_primary_10_3390_s23156928
crossref_primary_10_1038_s41598_025_89285_6
crossref_primary_10_3390_app14188383
crossref_primary_10_1016_j_arcontrol_2024_100948
crossref_primary_10_1007_s10489_025_06396_3
crossref_primary_10_3390_mti8040026
crossref_primary_10_1016_j_neucom_2024_127638
crossref_primary_10_3390_app132111905
crossref_primary_10_3390_systems11100525
crossref_primary_10_1007_s11227_024_06634_4
crossref_primary_10_3389_fenrg_2024_1418907
crossref_primary_10_1088_1742_6596_2767_3_032017
crossref_primary_10_1080_17477778_2024_2364715
crossref_primary_10_3390_en17112620
crossref_primary_10_1109_TVT_2024_3388499
crossref_primary_10_1093_scan_nsae014
crossref_primary_10_1002_rnc_7879
crossref_primary_10_1142_S0218126625300016
crossref_primary_10_1007_s40747_024_01415_1
crossref_primary_10_1016_j_cogsys_2024_101306
crossref_primary_10_1016_j_jgsce_2024_205469
crossref_primary_10_1109_TNNLS_2023_3343666
crossref_primary_10_3390_en16155653
crossref_primary_10_1007_s10489_023_05007_3
crossref_primary_10_1016_j_asoc_2025_112939
crossref_primary_10_1098_rsta_2024_0148
crossref_primary_10_3390_app15010116
crossref_primary_10_3390_machines12010008
crossref_primary_10_1016_j_asoc_2023_110758
crossref_primary_10_1016_j_neucom_2024_128068
crossref_primary_10_1109_TITS_2024_3521460
crossref_primary_10_3390_wevj15100453
crossref_primary_10_1007_s10489_024_05293_5
crossref_primary_10_1109_JIOT_2023_3319542
crossref_primary_10_1016_j_neunet_2025_107253
crossref_primary_10_1109_TWC_2024_3464639
crossref_primary_10_1007_s00521_023_08875_5
crossref_primary_10_1109_TCOMM_2023_3282256
crossref_primary_10_1177_03611981231203229
crossref_primary_10_1016_j_jnca_2024_103981
crossref_primary_10_1109_TCDS_2023_3323987
crossref_primary_10_1142_S2737480723500073
crossref_primary_10_1007_s10458_024_09641_0
crossref_primary_10_3390_drones8070320
crossref_primary_10_3390_jsan13010014
crossref_primary_10_1007_s43684_022_00045_z
crossref_primary_10_1109_ACCESS_2024_3501775
crossref_primary_10_3390_electronics12010089
crossref_primary_10_1007_s11704_024_3797_6
crossref_primary_10_1007_s11042_023_15361_6
crossref_primary_10_1109_JSEN_2024_3469539
crossref_primary_10_1016_j_phycom_2025_102621
crossref_primary_10_1080_00207543_2025_2479831
crossref_primary_10_1038_s41598_024_54531_w
crossref_primary_10_1109_JIOT_2024_3365293
crossref_primary_10_1109_TASE_2024_3398712
crossref_primary_10_1177_09544070241276062
crossref_primary_10_1007_s40747_024_01385_4
crossref_primary_10_1186_s13677_023_00532_5
crossref_primary_10_1109_TITS_2024_3503092
crossref_primary_10_1109_TAC_2024_3375248
crossref_primary_10_1109_JIOT_2024_3353185
crossref_primary_10_3390_aerospace11050372
crossref_primary_10_3390_aerospace10110913
crossref_primary_10_1016_j_ymssp_2024_111473
crossref_primary_10_1007_s10489_023_04601_9
crossref_primary_10_1109_TAC_2024_3387208
crossref_primary_10_1109_TCCN_2024_3384492
crossref_primary_10_3390_math11204392
crossref_primary_10_3390_electronics12122722
crossref_primary_10_1109_JIOT_2024_3497185
crossref_primary_10_1016_j_neucom_2024_128811
crossref_primary_10_1007_s10489_023_04866_0
crossref_primary_10_3390_s25030911
crossref_primary_10_3390_app142110079
crossref_primary_10_1038_s42256_023_00754_x
Cites_doi 10.1162/neco.1997.9.8.1735
10.1613/jair.3912
10.1023/A:1008942012299
10.1109/TAC.2020.2995814
10.1007/s10458-015-9292-6
10.1007/s10845-013-0864-5
10.1007/BF00992696
10.1038/s41586-021-03544-w
10.1007/978-3-030-05816-6_3
10.1609/aaai.v34i05.6216
10.1016/j.piutam.2011.04.021
10.1109/TSP.2013.2241057
10.1007/10992388_4
10.1016/j.ins.2016.05.002
10.24251/HICSS.2018.157
10.1049/iet-its.2009.0070
10.1609/aaai.v32i1.11492
10.1109/CAMSAP.2013.6714066
10.1109/TAC.2009.2037462
10.1145/3308558.3314139
10.1007/978-3-642-19457-3_1
10.1016/j.ejor.2019.10.049
10.1007/978-3-642-00644-9_33
10.1109/IROS.2013.6696903
10.1109/TCYB.2020.2977374
10.1109/SMCIA.2008.5045926
10.1609/aaai.v33i01.33017643
10.1109/TSMCC.2007.913919
10.1609/aaai.v34i05.6205
10.1098/rsta.20140071
10.1137/19M1288012
10.1007/978-3-642-79629-6_18
10.1080/24725854.2020.1851823 10.1080/24725854.2020.1851823
10.1007/978-3-642-14435-6_7
10.1016/j.artint.2021.103500
10.1016/j.eswa.2020.113701
10.1109/ITSC.2011.6083114
10.1145/1553374.1553501
10.1609/aaai.v30i1.10409
10.3390/s150510026
10.1109/ADPRL.2007.368171
10.1145/3412841.3441953
10.1016/j.artint.2014.11.001
10.1109/CDC.2018.8619581
10.1145/3219819.3219993
10.1109/LRA.2019.2903261
10.1109/LRA.2021.3077863
10.1038/nature16961
10.1007/978-3-319-71679-4
10.6028/jres.049.044
10.1177/0278364913495721
10.1007/s10614-020-10038-w
10.1109/COASE.2019.8843338
10.1145/375735.376302
10.1016/j.artint.2019.103216
10.1109/LRA.2020.3026638
10.24963/ijcai.2020/276
10.1609/aaai.v35i13.17353
10.1016/j.ress.2019.04.036
10.1016/j.eswa.2021.116323
10.1109/TSP.2012.2198470
10.1016/j.comnet.2015.12.017
10.1016/S0004-3702(02)00121-2
10.1109/TAC.2012.2209984
10.1109/CVPR.2016.12
10.1371/journal.pone.0172395
10.1109/IROS40897.2019.8968129 10.1109/IROS40897.2019.8968129
10.1038/nature24270
10.1109/IROS.2007.4399095
10.1109/CIG.2016.7860433
10.1007/978-3-030-60990-0_12
10.1109/ACC.2016.7524910
10.1109/TSIPN.2016.2524588
10.1007/s10514-011-9265-9
10.1109/CDC.2018.8619440
10.1049/trit.2018.0001
10.1613/jair.1.11396
10.1007/BF00992699
10.1016/j.eswa.2019.04.056
10.1109/IROS40897.2019.8968129
10.1016/j.ifacol.2020.12.2021
10.1016/j.artint.2014.11.006
10.1017/CBO9780511546877
10.1007/BF00992698
10.1007/s10107-016-1030-6
10.1109/IROS.2012.6386109
10.1007/978-3-642-27645-3_5
10.1609/aaai.v30i1.10295
10.1007/s10458-019-09421-1
10.1016/j.trc.2020.102861
10.1109/TAC.2014.2368731
10.1145/3302509.3311053
10.1002/wics.194
10.1287/msom.2020.0939.
10.1016/j.ress.2017.03.034
10.1038/nature14236
10.1109/TITS.2019.2901791
10.1287/moor.27.4.819.297
10.15607/RSS.2011.VII.035
10.1017/S0269888912000057
10.1109/TSMCA.2007.904825
10.1512/iumj.1957.6.56038
10.1016/j.cirp.2020.04.005
10.1109/ICASSP.2018.8461739
10.1109/CDC40024.2019.9029969
10.1109/TR.2020.3044596
10.1016/j.ifacol.2017.08.1217
10.1609/aaai.v34i04.5744
10.1080/08982112.2020.1766692
10.1145/3357384.3357900
10.1145/3357384.3357902
10.1016/j.renene.2021.09.023
10.1145/3292500.3330949
10.1137/S0363012997331639
10.1109/CVPR.2017.243
10.1109/TIE.2016.2542134
10.1109/ICHI.2017.45
10.1007/s11277-016-3849-9
10.1109/ICCV.2017.321
10.1016/j.future.2010.10.009
10.1016/B978-1-55860-307-3.50049-6
10.1109/ITSC.2014.6958095
10.1109/IROS51168.2021.9635857
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
DBID AAYXX
CITATION
3V.
7SC
7WY
7WZ
7XB
87Z
8AL
8FD
8FE
8FG
8FK
8FL
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
HCIFZ
JQ2
K60
K6~
K7-
L.-
L6V
L7M
L~C
L~D
M0C
M0N
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PSYQQ
PTHSS
Q9U
DOI 10.1007/s10489-022-04105-y
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Technology Collection
ProQuest One Community College
ProQuest Central Korea
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
ProQuest SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest One Psychology
Engineering Collection
ProQuest Central Basic
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
ProQuest One Psychology
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Engineering Collection
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
Engineering Database
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
Business Premium Collection (Alumni)
DatabaseTitleList ABI/INFORM Global (Corporate)

Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-7497
EndPage 13722
ExternalDocumentID 10_1007_s10489_022_04105_y
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.86
.DC
.VR
06D
0R~
0VY
1N0
1SB
2.D
203
23M
28-
2J2
2JN
2JY
2KG
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
77K
7WY
8FE
8FG
8FL
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABIVO
ABJCF
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTAH
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BPHCQ
BSONS
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
KOW
L6V
LAK
LLZTM
M0C
M0N
M4Y
M7S
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P62
P9O
PF0
PQBIZ
PQBZA
PQQKQ
PROAC
PSYQQ
PT4
PT5
PTHSS
Q2X
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z81
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8U
Z8W
Z92
ZMTXR
ZY4
~A9
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
7SC
7XB
8AL
8FD
8FK
ABRTQ
JQ2
L.-
L7M
L~C
L~D
PKEHL
PQEST
PQGLB
PQUKI
Q9U
ID FETCH-LOGICAL-c319t-1dbb821510d5f5edfe37c4e88091e895dd8d872a3e1a84f3ede854511a89eada3
IEDL.DBID U2A
ISSN 0924-669X
IngestDate Fri Jul 25 12:27:19 EDT 2025
Tue Jul 01 03:31:54 EDT 2025
Thu Apr 24 22:59:04 EDT 2025
Fri Feb 21 02:43:12 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords Cooperative learning
Reinforcement learning
Multi-agent systems
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c319t-1dbb821510d5f5edfe37c4e88091e895dd8d872a3e1a84f3ede854511a89eada3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-7829-6145
PQID 2821175683
PQPubID 326365
PageCount 46
ParticipantIDs proquest_journals_2821175683
crossref_citationtrail_10_1007_s10489_022_04105_y
crossref_primary_10_1007_s10489_022_04105_y
springer_journals_10_1007_s10489_022_04105_y
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20230600
2023-06-00
20230601
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 6
  year: 2023
  text: 20230600
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Boston
PublicationSubtitle The International Journal of Research on Intelligent Systems for Real Life Complex Problems
PublicationTitle Applied intelligence (Dordrecht, Netherlands)
PublicationTitleAbbrev Appl Intell
PublicationYear 2023
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References WuJXuXZhangPLiuCA novel multi-agent reinforcement learning approach for job scheduling in grid computingFutur Gener Comput Syst201127543043910.1016/j.future.2010.10.009
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. Adv Neural Inf Process Syst. Deep learning workshop
kthankar G, Rodriguez-Aguilar JA (2017) Autonomous agents and multiagent systems. In: AAMAS 2017 workshops, best papers, São Paulo, Brazil, 8-12 May 2017. Revised selected papers, vol 10642. Springer
Nazari M, Oroojlooy A, Snyder L, Takác M. (2018) Reinforcement learning for solving the vehicle routing problem
Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games
Usunier N, Synnaeve G, Lin Z, Chintala S (2017) Episodic exploration for deep deterministic policies for starcraft micromanagement. In: International conference on learning representations. https://openreview.net/forum?id=r1LXit5ee. Accessed 28 July 2019
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence
Ryu H, Shin H, Park J (2018) Multi-agent actor-critic with generative cooperative policy network. arXiv:1810.09206
Huang J, Chang Q, Chakraborty N (2019) Machine preventive replacement policy for serial production lines based on reinforcement learning. In: 2019 IEEE 15th international conference on automation science and engineering (CASE). IEEE, pages 523–528
Sam D, Daniel K (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th international conference on autonomous agents and multiagent systems-volume 1. International foundation for autonomous agents and multiagent systems, pp 225–232
Greg B, Vicki C, Ludwig P, Jonas S, John S, Jie T, Wojciech Zaremba (2016) Openai gym
StonePVelosoMMultiagent systems: a survey from a machine learning perspectiveAuton Robot20008334538310.1023/A:1008942012299
Mao H, Zhang Z, Xiao Z, Gong Z (2019) Modelling the dynamic joint policy of teammates with attention multi-agent ddpg. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, pp 1108–1116
Chen HWC, Nan X u, Zheng G, Yang M, Xiong Y, Kai X, Li Z (2020) Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. In: Proceedings of the thirty-fourth AAAI conference on artificial intelligence
KarSMouraJoséMFH VincentP${\mathcal {QD}}$QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovationsIEEE Trans Signal Process20136171848186230383951393.9429310.1109/TSP.2013.2241057
Su J, Adams S, Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11352–11360
Singh A, Jain T, Sukhbaatar S (2018) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: ICLR
Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) MINOS: multimodal indoor simulator for navigation in complex environments. arXiv:1712.03931
Wang RE, Everett M, How JP (2019b) R-maddpg for partially observable environments and limited communication. ICML 2019 Workshop RL4reallife
Hernandez-LealPKartalBTaylorMEA survey and critique of multiagent deep reinforcement learningAuton Agent Multi-Agent Syst201933675079710.1007/s10458-019-09421-1
Jorge E, Kågebäck M, Johansson FD, Gustavsson E (2016) Learning to play guess who? and inventing a grounded language as a consequence. arXiv:1611.03218
Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in neural information processing systems, pp 4190– 4203
Gabel T, Riedmiller M (2007) On a successful application of multi-agent reinforcement learning to operations research benchmarks. In: 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning. IEEE, pp 68–75
Varshavskaya P, Kaelbling LP, Rus D (2019) Efficient distributed reinforcement learning through agreement. In: Distributed autonomous robotic systems 8. Springer, pp 367–378
Mousavi HK, Liu G, Yuan W, Takác M, Munoz-Avila H, Motee N (2019) A layered architecture for active perception: Image classification using deep reinforcement learning. CoRR, arXiv:1909.09705
KoberJAndrew BagnellJJan PReinforcement learning in robotics a surveyInt J Robot Res201332111238127410.1177/0278364913495721
Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pages 993–1000, New York. ACM. ISBN 978-1-60558-516-1. https://doi.org/10.1145/1553374.1553501
Shuo J (2019) Multi-Agent Reinforcement Learning Environment. https://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environmenthttps://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environment Accessed 2019-07-28
WuJXuXDecentralised grid scheduling approach based on multi-agent reinforcement learning and gossip mechanismCAAI Trans Intell Technol20183181710.1049/trit.2018.0001
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning-vol 70. JMLR. org, pp 3540–3549
DandanovNAl-ShatriHKleinAPoulkovVDynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networksWirel Pers Commun201792125127810.1007/s11277-016-3849-9
Tasfi N (2016) Pygame learning environment. https://github.com/ntasfi/PyGame-Learning-Environment. Accessed 28 July 2019
Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning. Citeseer
Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Gonzalez J, Goldberg K, Stoica I (2017) Ray RLlib: a composable and scalable reinforcement learning library. In: Deep reinforcement learning symposium (DeepRL @ NeurIPS)
Zhang K, Yang Z, Liu H, Zhang T, Basar T (2018c) Fully decentralized multi-agent reinforcement learning with networked agents. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. of proceedings of machine learning research. PMLR, 10–15 Jul, Stockholmsmassan, Stockholm Sweden pp 5872–5881
HuangJChangQArinezJDeep reinforcement learning based preventive maintenance policy for serial production linesExpert Syst Appl202016011370110.1016/j.eswa.2020.113701
HolmesParker C, Taylor ME, Zhan Y, Tumer K (2014) Exploiting structure and agent-centric rewards to promote coordination in large multiagent systems. In: Adaptive and learning agents workshop
Prabuchandran KJ, Hemanth Kumar AN, Bhatnagar S (2014) Multi-agent reinforcement learning for traffic signal control. In: 17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, pp 2529–2534
Marc B, Peng W (2019) Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. In: Reinforcement learning for real life workshop in the 36th international conference on machine learning, long beach
HochreiterSSchmidhuberJLong short-term memoryNeural comput1997981735178010.1162/neco.1997.9.8.1735
Son K, Kim D, Kang WJ, Hostallero ED, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 31st international conference on machine learning, proceedings of machine learning research. PMLR
SuttleWYangZZhangKWangZBaşarTLiuJA multi-agent off-policy actor-critic algorithm for distributed reinforcement learningIFAC-PapersOnLine. ISSN 2405-8963. 21th IFAC World Congress202053215491554https://doi.org/10.1016/j.ifacol.2020.12.2021. https://www.sciencedirect.com/science/article/pii/S2405896320326562
Lucian BRobert BBart DeSA comprehensive survey of multiagent reinforcement learningIEEE Trans Syst, Man, Cybern, Part C (Appl Rev)200838215617210.1109/TSMCC.2007.913919
Cáp M, Novák P, Seleckỳ M, Faigl J, Jiff V. (2013) Asynchronous decentralized prioritized planning for coordination in multi-robot system. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3822–3829
CassanoLYuanKSayedAHMultiagent fully decentralized value function learning with linear convergence ratesIEEE Trans Auto Cont20216641497151242401830735220510.1109/TAC.2020.2995814https://doi.org/10.1109/TAC.2020.2995814
Weiß G (1995) Distributed reinforcement learning. In: Luc Steels (ed) The Biology and technology of intelligent autonomous agents, pp 415–428. Berlin, Heidelberg. Springer Berlin Heidelberg
Hado VH (2010) Double q-learning. In: Advances in neural information processing systems, pp 2613–2621
Lanctot M, Lockhart E, Lespiau J-B, Zambaldi V, Upadhyay S, Pérolat J, Srinivasan S, Timbers F, Tuyls K, Omidshafiei S et al (2019) Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453
Guillaume S, Yue W, William P, TK SK, Sven K, Howie C (2019b) Distributed reinforcement learning for multi-robot decentralized collective construction. In: Distributed Autonomous Robotic Systems. Springer, pp 35–49
AndriotisCPPapakonstantinouKGManaging engineering systems with large state and action spaces through deep reinforc
Y Gong (4105_CR66) 2019; 1
K Zhang (4105_CR257) 2020; 58
4105_CR214
4105_CR213
4105_CR216
4105_CR215
4105_CR218
4105_CR217
4105_CR210
4105_CR212
4105_CR211
4105_CR104
4105_CR225
4105_CR103
4105_CR224
P Hernandez-Leal (4105_CR70) 2019; 33
4105_CR105
4105_CR226
4105_CR108
4105_CR229
4105_CR107
4105_CR100
4105_CR220
4105_CR102
4105_CR223
4105_CR101
S Bart De (4105_CR29) 2008; 38
H Edward (4105_CR13) 2020; 280
J Chen (4105_CR34) 2012; 60
Z Ke (4105_CR259) 2020; 121
4105_CR93
Q Guannan (4105_CR161) 2017; 5
4105_CR94
4105_CR91
H Zhang (4105_CR253) 2017; 64
4105_CR92
4105_CR99
4105_CR97
4105_CR98
J Wu (4105_CR238) 2018; 3
J Su (4105_CR194) 2022; 192
M Glavic (4105_CR65) 2017; 50
4105_CR82
4105_CR203
4105_CR83
4105_CR202
T Chu (4105_CR39) 2019; 21
4105_CR80
4105_CR81
4105_CR204
4105_CR207
P Stone (4105_CR192) 2000; 8
4105_CR209
4105_CR208
PD Lorenzo (4105_CR49) 2016; 2
4105_CR88
4105_CR89
4105_CR86
J Lussange (4105_CR122) 2021; 57
4105_CR87
4105_CR84
4105_CR85
4105_CR200
W Suttle (4105_CR201) 2020; 53
4105_CR71
A Eck (4105_CR53) 2016; 30
FLD Silva (4105_CR43) 2019; 64
A Oroojlooyjadid (4105_CR150) 2017; 0
D Silver (4105_CR184) 2017; 550
4105_CR79
4105_CR77
4105_CR75
4105_CR76
4105_CR73
A Huang (4105_CR183) 2016; 529
4105_CR74
K Zhang (4105_CR258) 2021
S Wang (4105_CR227) 2016; 101
Kyuree A (4105_CR4) 2021; 0
DS Bernstein (4105_CR18) 2002; 27
4105_CR60
4105_CR61
VS Borkar (4105_CR24) 2000; 38
M-A Dittrich (4105_CR52) 2020; 69
Y Liu (4105_CR120) 2020; 283
4105_CR68
4105_CR69
4105_CR67
4105_CR62
4105_CR63
VR Padullaparthi (4105_CR151) 2022; 181
4105_CR59
4105_CR50
J Garcıa (4105_CR64) 2015; 16
M Zhang (4105_CR246) 2015; 15
4105_CR57
4105_CR195
4105_CR58
4105_CR197
4105_CR56
M Schmidt (4105_CR173) 2017; 162
4105_CR196
4105_CR199
4105_CR54
4105_CR198
P Bianchi (4105_CR22) 2012; 58
4105_CR48
A Seth (4105_CR177) 2011; 2
Y Song (4105_CR189) 2020; 34
4105_CR46
4105_CR47
N Yousefi (4105_CR248) 2020; 32
4105_CR45
4105_CR40
4105_CR41
R Cui (4105_CR42) 2012; 32
4105_CR180
4105_CR37
4105_CR182
4105_CR38
M Sergio Valcarcel (4105_CR126) 2015; 60
CP Andriotis (4105_CR7) 2019; 191
J Huang (4105_CR78) 2020; 160
X Wang (4105_CR228) 2016; 27
4105_CR35
4105_CR36
4105_CR172
4105_CR33
4105_CR175
4105_CR31
4105_CR176
4105_CR30
4105_CR178
4105_CR28
4105_CR191
A Tampuu (4105_CR206) 2017; 12
4105_CR190
M Bowling (4105_CR26) 2002; 136
4105_CR193
4105_CR27
N Dandanov (4105_CR44) 2017; 92
G Sartoretti (4105_CR169) 2019; 4
Z Ding (4105_CR51) 2020; 33
A Mirhoseini (4105_CR133) 2021; 594
4105_CR25
4105_CR186
4105_CR23
4105_CR185
4105_CR20
4105_CR188
4105_CR21
4105_CR187
P Pennesi (4105_CR157) 2010; 55
4105_CR17
4105_CR15
4105_CR159
4105_CR158
CJ Watkins (4105_CR231) 1992; 8
MG Bellemare (4105_CR16) 2013; 47
4105_CR14
4105_CR11
4105_CR153
4105_CR12
4105_CR152
Dimitri P B (4105_CR19) 1996
4105_CR155
4105_CR10
4105_CR154
4105_CR156
G Wagner (4105_CR219) 2015; 219
4105_CR171
4105_CR170
S Hochreiter (4105_CR72) 1997; 9
RS Sutton (4105_CR205) 2016; 17
Q Li (4105_CR109) 2021; 6
L Cassano (4105_CR32) 2021; 66
4105_CR162
4105_CR164
4105_CR166
4105_CR165
4105_CR168
M-C Fitouhi (4105_CR55) 2017; 166
4105_CR167
4105_CR137
4105_CR136
4105_CR139
MAL Silva (4105_CR181) 2019; 131
4105_CR138
R Williams (4105_CR236) 1992; 8
4105_CR250
4105_CR252
4105_CR130
4105_CR251
4105_CR254
4105_CR132
4105_CR135
4105_CR256
4105_CR134
TT Nguyen (4105_CR147) 2020; 50
4105_CR255
S Kar (4105_CR90) 2013; 61
H Wang (4105_CR222) 2016; 363
4105_CR148
L-J Lin (4105_CR115) 1992; 8
4105_CR149
4105_CR140
4105_CR261
4105_CR260
4105_CR142
4105_CR263
4105_CR141
4105_CR262
4105_CR144
4105_CR143
4105_CR146
4105_CR145
4105_CR114
4105_CR235
J Kober (4105_CR95) 2013; 32
4105_CR117
4105_CR116
4105_CR237
4105_CR119
4105_CR118
G Sharon (4105_CR179) 2015; 219
4105_CR230
I Arel (4105_CR8) 2010; 4
4105_CR111
4105_CR232
4105_CR110
4105_CR113
4105_CR234
4105_CR112
M Rangwala (4105_CR163) 2020; 33
4105_CR233
DP Kroese (4105_CR96) 2012; 4
LA Prashanth (4105_CR160) 2010; 12
4105_CR9
4105_CR6
4105_CR5
L Matignon (4105_CR131) 2012; 2
4105_CR2
J Wu (4105_CR239) 2011; 27
4105_CR3
4105_CR1
4105_CR247
4105_CR125
4105_CR128
4105_CR249
4105_CR127
B Wang (4105_CR221) 2020; 5
4105_CR129
J Lee (4105_CR106) 2007; 37
CS de Witt (4105_CR174) 2019; 32
4105_CR241
4105_CR240
4105_CR243
4105_CR121
4105_CR242
4105_CR124
4105_CR245
4105_CR123
4105_CR244
References_xml – reference: Kamil C, Shimon W (2020) Expected policy gradients for reinforcement learning. http://jmlr.org/papers/v21/18-012.html. Accessed 28 Feb 2021, vol 21, pp 1–51
– reference: Shu T, Tian Y (2019) M3RL: mind-aware multi-agent management reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=BkzeUiRcY7. Accessed 18 Jan 2020
– reference: Jakob F, Nantas N, Gregory F, Triantafyllos A, Philip HS T, Pushmeet K, Shimon W (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR. org, pp 1146–1155
– reference: WuJXuXZhangPLiuCA novel multi-agent reinforcement learning approach for job scheduling in grid computingFutur Gener Comput Syst201127543043910.1016/j.future.2010.10.009
– reference: Mousavi HK, Liu G, Yuan W, Takác M, Munoz-Avila H, Motee N (2019) A layered architecture for active perception: Image classification using deep reinforcement learning. CoRR, arXiv:1909.09705
– reference: Jiang J, Chen D, Tiejun H, Zongqing L (2020) Graph convolutional reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=HkxdQkSYDB. Accessed 15 May 2020
– reference: de WittCSFoersterJFarquharGTorrPBoehmerWWhitesonSMulti-agent common knowledge reinforcement learningAdv Neural Inf Process Syst20193299279939
– reference: Hado VH (2010) Double q-learning. In: Advances in neural information processing systems, pp 2613–2621
– reference: Xueguang L, Yuchen X, Brett D, Chris A (2021) Contrasting centralized and decentralized critics in multi-agent reinforcement learning. In: AAMAS
– reference: Natasha J, Angeliki L, Edward H, Caglar G, Pedro O, Dj S, Joel ZL, Nando DF (2019) Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, long beach, California, USA, 09–15 Jun pp 3040–3049. http://proceedings.mlr.press/v97/jaques19a.html. Accessed 28 Oct 2019
– reference: SilvaMALde SouzaSRSouzaMJFBazzanALCA reinforcement learning-based multi-agent framework applied for solving routing and scheduling problemsExpert Syst Appl201913114817110.1016/j.eswa.2019.04.056
– reference: Arrow JA, Hurwicz L, Uzawa H (1958) Studies in linear and non-linear programming. Stanford University Press
– reference: Lillicrap T, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR (Poster)
– reference: Dimitri P BJohn N TNeuro-Dynamic Programming1996Belmont, MAAthena Scientific0924.68163
– reference: ChenJSayedAHDiffusion adaptation strategies for distributed optimization and learning over networksIEEE Trans Signal Process20126084289430529604961391.9060110.1109/TSP.2012.2198470
– reference: HolmesParker C, Taylor ME, Zhan Y, Tumer K (2014) Exploiting structure and agent-centric rewards to promote coordination in large multiagent systems. In: Adaptive and learning agents workshop
– reference: Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pages 993–1000, New York. ACM. ISBN 978-1-60558-516-1. https://doi.org/10.1145/1553374.1553501
– reference: Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016d) Dueling network architectures for deep reinforcement learning. In: Balcan MF, Weinberger KQ (eds) Proceedings of The 33rd international conference on machine learning, vol 48 of Proceedings of machine learning research, pp 1995–2003, New York, New York, USA, 20–22 Jun. PMLR. http://proceedings.mlr.press/v48/wangf16.html. Accessed 28 July 2019
– reference: Kyunghyun C, Bart van M, Gülçehre Ç, Dzmitry B, Fethi B, Holger S, Yoshua B (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: EMNLP, pp 1724–1734. http://aclweb.org/anthology/D/D14/D14-1179.pdf. Accessed 28 July 2019
– reference: Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv:1610.03295
– reference: Yang Y, Wang J (2020) An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv:2011.00583
– reference: Arora S, Prashant D (2021) A survey of inverse reinforcement learning Challenges, methods and progress. Artif Intell, pp 103500
– reference: Wu C, Kreidieh A, Parvate K, Vinitsky E, Bayen AM (2017) Flow: Architecture and benchmarking for reinforcement learning in traffic control. arXiv:1710.05465
– reference: Hernandez-LealPKartalBTaylorMEA survey and critique of multiagent deep reinforcement learningAuton Agent Multi-Agent Syst201933675079710.1007/s10458-019-09421-1
– reference: Wu Y, Wu Y, Gkioxari G, Tian Y (2018) Building generalizable agents with a realistic and rich 3d environment. https://openreview.net/forum?id=rkaT3zWCZ. Accessed 28 July 2019
– reference: Seijen HV, Fatemi M, Romoff J, Laroche R, Barnes T, Tsang J (2017) Hybrid reward architecture for reinforcement learning. In: Advances in Neural Information Processing Systems, pp 5392–5402
– reference: Papoudakis G, Christianos F, Schäfer L, Albrecht SV (2021) Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track (Round 1). https://openreview.net/forum?id=cIrPX-Sn5n. Accessed 21 Nov 2021
– reference: Wang RE, Everett M, How JP (2019a) R-maddpg for partially observable environments and limited communication. In: Reinforcement learning for real life workshop in the 36th international conference on machine learning, Long Beach
– reference: Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
– reference: Bhatnagar S, Precup D, Silver D, Sutton RS, Maei HR, Szepesvári C. (2009) Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in neural information processing systems, pp 1204–1212
– reference: Su J, Adams S, Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11352–11360
– reference: ZhangKYangZBasarTMulti-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp 321–3842021ChamSpringer
– reference: Nolan BJakob N FSarath CNeil BMarc LH F SEmilio PVincent DSubhodeep MEdwardHThe hanabi challenge: a new frontier for ai researchArtif Intell202028010321640425671476.6822310.1016/j.artint.2019.103216
– reference: Devlin S, Yliniemi L, Kudenko D, Kagan T (2014) Potential-based difference rewards for multiagent reinforcement learning
– reference: Rabinowitz N, Perbet F, Song F, Zhang C, Eslami SMA, Botvinick M (2018) Machine theory of mind. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research. PMLR, Stockholmsmassan, Stockholm Sweden, 10–15 Jul, pp 4218–4227. http://proceedings.mlr.press/v80/rabinowitz18a.html
– reference: Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: ICLR (Poster)
– reference: Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J, Jordan M, Stoica I (2018) RLlib: abstractions for distributed reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research. PMLR, 10–15 Jul, pp 3053–3062. http://proceedings.mlr.press/v80/liang18b.html. Accessed 23 Nov 2019
– reference: Chen HWC, Nan X u, Zheng G, Yang M, Xiong Y, Kai X, Li Z (2020) Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. In: Proceedings of the thirty-fourth AAAI conference on artificial intelligence
– reference: SharonGSternRFelnerASturtevantNRConflict-based search for optimal multi-agent pathfindingArtif Intell2015219406632937521328.6823510.1016/j.artint.2014.11.006
– reference: Wenhang B, Xiao-yang L (2019) Multi-agent deep reinforcement learning for liquidation strategy analysis. In: Workshops at the Thirty-Sixth ICML Conference on AI in Finance
– reference: Dayong YZhangMYangYA multi-agent framework for packet routing in wireless sensor networksSensors2015155100261004710.3390/s150510026
– reference: BianchiPJakubowiczJConvergence of a multi-agent projected stochastic gradient algorithm for non-convex optimizationIEEE Trans Autom Control201258239140530239311369.9013110.1109/TAC.2012.2209984
– reference: SuJHuangJAdamsSChangQBelingPADeep multi-agent reinforcement learning for multi-level preventive maintenance in manufacturing systemsExpert Syst Appl202219211632310.1016/j.eswa.2021.116323
– reference: Steffen B (2015) Tecnomatix plant simulation: Modeling and programming by means of examples. In: Springer
– reference: Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games
– reference: Johnson M, Hofmann K, Hutton T, Bignell D (2016) The malmo platform for artificial intelligence experimentation. In: IJCAI, pp 4246–4247
– reference: Mordatch I, Abbeel P (2018b) Emergence of grounded compositional language in multi-agent populations. In: Thirty-second AAAI conference on artificial intelligence
– reference: ArelILiuCUrbanikTKohlsAGReinforcement learning-based multi-agent system for network traffic signal controlIET Intell Transp Syst20104212813510.1049/iet-its.2009.0070
– reference: Samvelyan M, Rashid T, De Witt CS, Farquhar G, Nardelli N, Rudner TGJ, Hung C-M, Torr PHS, Foerster J, Whiteson S (2019b) The starcraft multi-agent challenge. arXiv:1902.04043
– reference: Huang J, Chang Q, Chakraborty N (2019) Machine preventive replacement policy for serial production lines based on reinforcement learning. In: 2019 IEEE 15th international conference on automation science and engineering (CASE). IEEE, pages 523–528
– reference: Lazaridou A, Peysakhovich A, Baroni M (2017) Multi-agent cooperation and the emergence of (natural) language. In: ICLR
– reference: Matignon L, Laurent G, Fort-Piat NL (2007) Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ international conference on intelligent robots and systems. IROS’07, pp 64– 69
– reference: Yu Fan C, Miao L, Michael E, How JP (2017) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 285–292
– reference: BernsteinDSGivanRImmermanNZilbersteinSThe complexity of decentralized control of markov decision processesMath Oper Res200227481984019391791082.9059310.1287/moor.27.4.819.297
– reference: Chu X, Ye H (2017) Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. arXiv:1710.00336
– reference: Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
– reference: Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 AAAI fall symposium series
– reference: Lucian BRobert BBart DeSA comprehensive survey of multiagent reinforcement learningIEEE Trans Syst, Man, Cybern, Part C (Appl Rev)200838215617210.1109/TSMCC.2007.913919
– reference: Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR, pp 1587–1596
– reference: Zhang Y, Zavlanos MM (2019) Distributed off-policy actor-critic reinforcement learning with policy consensus. In: 2019 IEEE 58th Conference on decision and control (CDC), pp 4674–467. https://doi.org/10.1109/CDC40024.2019.9029969
– reference: DittrichM-AFohlmeisterSCooperative multi-agent system for production control using reinforcement learningCIRP Ann202069138939210.1016/j.cirp.2020.04.005
– reference: Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning-vol 70. JMLR. org, pp 3540–3549
– reference: Buşoniu L, Babuška R, Bart DS (2010) Multi-agent reinforcement learning: An overview. In: Innovations in multi-agent systems and applications-1. Springer, pp 183–221
– reference: Jorge E, Kågebäck M, Johansson FD, Gustavsson E (2016) Learning to play guess who? and inventing a grounded language as a consequence. arXiv:1611.03218
– reference: Wai HT, Yang Z, Wang PZ, Hong M (2018) Multi-agent reinforcement learning via double averaging primal-dual optimization. In: Advances in neural information processing systems, pp 9649–9660
– reference: Mousavi HK, Nazari M, Takáč M, Motee N (2019) Multi-agent image classification via reinforcement learning. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5020–5027. https://doi.org/10.1109/IROS40897.2019.8968129https://doi.org/10.1109/IROS40897.2019.8968129
– reference: Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) MINOS: multimodal indoor simulator for navigation in complex environments. arXiv:1712.03931
– reference: RangwalaMWilliamsRLearning multi-agent communication through structured attentive reasoningAdv Neural Inf Process Syst2020331008810098
– reference: Chen Y, Liu Y u, Xiahou T (2021) A deep reinforcement learning approach to dynamic loading strategy of repairable multistate systems. IEEE Trans Reliability
– reference: WuJXuXDecentralised grid scheduling approach based on multi-agent reinforcement learning and gossip mechanismCAAI Trans Intell Technol20183181710.1049/trit.2018.0001
– reference: LiQLinWLiuZProrokAMessage-aware graph attention networks for large-scale multi-robot path planningIEEE Robot Autom Lett2021635533554010.1109/LRA.2021.3077863
– reference: Richard B (1957) A markovian decision process. J Math Mech, pp 679–684
– reference: Gao Q, Hajinezhad D, Zhang Y, Kantaros Y, Zavlanos MM (2019) Reduced variance deep reinforcement learning with temporal logic specifications. In: Proceedings of the 10th ACM/IEEE international conference on cyber-physical systems. ACM, pp 237–248
– reference: Singh A, Jain T, Sukhbaatar S (2018) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: ICLR
– reference: David B, Xiangyu Z, Dylan W, Deepthi V, Rohit C, Jennifer K, Ahmed S Z (2021) Powergridworld: a framework for multi-agent reinforcement learning in power systems arXiv:2111.05969
– reference: HuangJChangQArinezJDeep reinforcement learning based preventive maintenance policy for serial production linesExpert Syst Appl202016011370110.1016/j.eswa.2020.113701
– reference: Hong M, Hajinezhad D, Zhao M-M (2017) Prox-pda: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR. org, pp 1529–1538
– reference: Kasai T, Tenmoto H, Kamiya A (2008) Learning of communication codes in multi-agent reinforcement learning problem. In: IEEE conference on soft computing in industrial applications. IEEE, pp 1–6
– reference: GlavicMFonteneauRErnstDReinforcement learning for electric power system decision and control: past considerations and perspectivesIFAC-PapersOnLine20175016918692710.1016/j.ifacol.2017.08.1217
– reference: Abhishek D, Théophile G, Joshua R, Dhruv B, Devi P, Mike R, Joelle P (2019) TarMAC: Targeted multi-agent communication. In: Kamalika Chaudhuri, Ruslan Salakhutdinov (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, long beach, California, pp 1538–1546, 09–15 Jun, http://proceedings.mlr.press/v97/das19a.html. Accessed 28 Oct 2019
– reference: Yedid H (2017) Vain: attentional multi-agent predictive modeling. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. Curran Associates, Inc., pp 2701–2711. http://papers.nips.cc/paper/6863-vain-attentional-multi-agent-predictive-modeling.pdfhttp://papers.nips.cc/paper/6863-vain-attentional-multi-agent-predictive-modeling.pdf. Accessed 28 Oct 2019
– reference: Ma H, Tovey C, Sharon G, Kumar TK, Koenig S (2016) Multi-agent path finding with payload transfers and the package-exchange robot-routing problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 30
– reference: Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence
– reference: Aimsun (2019) Aimsun next 8.4 user’s manual. In: Aimsun SL
– reference: Tasfi N (2016) Pygame learning environment. https://github.com/ntasfi/PyGame-Learning-Environment. Accessed 28 July 2019
– reference: DandanovNAl-ShatriHKleinAPoulkovVDynamic self-optimization of the antenna tilt for best trade-off between coverage and capacity in mobile networksWirel Pers Commun201792125127810.1007/s11277-016-3849-9
– reference: Zheng G, Xiong Y, Zang X, Feng J, Wei H, Zhang H, Li Y, Kai XU, Li Z (2019) Learning phase competition for traffic signal control. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1963–1972
– reference: Varshavskaya P, Kaelbling LP, Rus D (2019) Efficient distributed reinforcement learning through agreement. In: Distributed autonomous robotic systems 8. Springer, pp 367–378
– reference: Jiang J, Zongqing L u (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264
– reference: Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Łukasz K, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
– reference: KarSMouraJoséMFH VincentP${\mathcal {QD}}$QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovationsIEEE Trans Signal Process20136171848186230383951393.9429310.1109/TSP.2013.2241057
– reference: Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 5026–5033
– reference: Lin K, Zhao R, Zhe X, Zhou J (2018) Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 1774–1783
– reference: Ruishan L, James Z (2018) The effects of memory replay in reinforcement learning. In: 2018 56th annual allerton conference on communication, control, and computing (Allerton). IEEE, pp 478–485
– reference: Alekh A, Sham M K, Jason D L, Gaurav M (2020) Optimality and approximation with policy gradient methods in markov decision processes. In: Conference on learning theory. PMLR, pp 64–66
– reference: Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, volume 80 of proceedings of machine learning research. PMLR, Stockholmsmassan, Stockholm Sweden, 10–15 Jul, pp 4295–4304. http://proceedings.mlr.press/v80/rashid18a.html. Accessed 27 Feb 2019
– reference: Liu Y, Logan B, Liu N, Zhiyuan X, Tang J, Wang Y (2017) Deep reinforcement learning for dynamic treatment regimes on medical registry data. In: 2017 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 380–385
– reference: Mguni D, Jennings J, Macua SV, Ceppi S, de Cote EM (2018) Controlling the crowd: inducing efficient equilibria in multi-agent systems. In: Advances in neural information processing systems 2018 MLITS workshop
– reference: Yang J, Nakhaei A, Isele D, Fujimura K, Zha H (2020) Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=S1lEX04tPr. Accessed 08 Nov 2020
– reference: Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
– reference: Shariq I, Fei S (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, volume 97 of proceedings of machine learning research. PMLR, long beach, California, USA, 09–15 Jun, pp 2961–2970. http://proceedings.mlr.press/v97/iqbal19a.html
– reference: Jing G, Bai H, George J, chakrabortty A, Piyush K S (2022) A scalable graph-theoretic distributed framework for cooperative multi-agent reinforcement learning. arXiv:2202.13046
– reference: kthankar G, Rodriguez-Aguilar JA (2017) Autonomous agents and multiagent systems. In: AAMAS 2017 workshops, best papers, São Paulo, Brazil, 8-12 May 2017. Revised selected papers, vol 10642. Springer
– reference: Das A, Kottur S, Moura José MF, Lee S, Batra D (2017) Learning cooperative visual dialog agents with deep reinforcement learning. In: Inproceedings of the IEEE international conference on computer vision, pp 2951–2960
– reference: GongYAbdel-AtyMCaiQMdSRDecentralized network level adaptive signal control by multi-agent deep reinforcement learningTransp Res Interdiscip Perspect20191100020
– reference: EckASohL-KDevlinSKudenkoDPotential-based reward shaping for finite horizon online pomdp planningAuton Agent Multi-Agent Syst201630340344510.1007/s10458-015-9292-6
– reference: CassanoLYuanKSayedAHMultiagent fully decentralized value function learning with linear convergence ratesIEEE Trans Auto Cont20216641497151242401830735220510.1109/TAC.2020.2995814https://doi.org/10.1109/TAC.2020.2995814
– reference: Wang L, Cai Q, Yang Z, Wang Z (2020b) Neural policy gradient methods: Global optimality and rates of convergence. In: International conference on learning representations. https://openreview.net/forum?id=BJgQfkSYDS. Accessed 02 July 2020
– reference: GuannanQNaLHarnessing smoothness to accelerate distributed optimizationIEEE Trans Cont Netw Syst20175312451260386100907044988
– reference: Zhang H, Feng S, Liu C, Ding Y, Zhu Y, Zhou Z, Zhang W, Yong Y u, Jin H, Li Z (2019) Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario. In: The world wide web conference. ACM, pp 3620–3624
– reference: WagnerGChosetHSubdimensional expansion for multirobot path planningArtif Intell201521912432937501328.6823610.1016/j.artint.2014.11.001
– reference: KeZHeFZhangZXiLLiMMulti-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approachTransp Res C: Emerg Technol202012110286110.1016/j.trc.2020.102861
– reference: David SHuangAMaddisonCJGuezASifreLDen DriesscheGVSchrittwieserJAntonoglouIPanneershelvamVLanctotMMastering the game of go with deep neural networks and tree searchNature2016529758748410.1038/nature16961
– reference: WangBLiuZLiQProrokAMobile robot path planning in dynamic environments through globally guided reinforcement learningIEEE Robotics and Automation Letters2020546932693910.1109/LRA.2020.3026638
– reference: Ying B, Yuan K, Sayed AH (2018) 2018 Convergence of variance-reduced learning under random reshuffling. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2286–2290
– reference: Tang H, Hao J, Lv T, Chen Y, Zhang Z, Jia H, Ren CYZ, Fan C, Wang L (2018) Hierarchical deep multiagent reinforcement learning. arXiv:1809.09332
– reference: Zawadzki E, Lipson A, Leyton-Brown K (2014) Empirically evaluating multiagent learning algorithms. arXiv:1401.8074
– reference: Stanković M, Stanković S (2016) Multi-agent temporal-difference learning with linear function approximation: weak convergence under time-varying network topologies. In: 2016 American control conference (ACC), pp 167–172. https://doi.org/10.1109/ACC.2016.7524910
– reference: Zhang K, Yang Z (2018b) Tamer Basar Networked multi-agent reinforcement learning in continuous spaces. In: 2018 IEEE Conference on decision and control (CDC), IEEE. pp 2771–2776
– reference: Zhang K, Yang Z, Liu H, Zhang T, Basar T (2018c) Fully decentralized multi-agent reinforcement learning with networked agents. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. of proceedings of machine learning research. PMLR, 10–15 Jul, Stockholmsmassan, Stockholm Sweden pp 5872–5881
– reference: Terry JK, Black BJ, Grammel N, Jayakumar M, Hari A, Sullivan R, Santos L, Dieffendahl C, Horsch C, Perez-Vicente RDL, Williams NL, Lokesh Y, Ravi P (2021) Pettingzoo: gym for multi-agent reinforcement learning. In: Beygelzimer A, Dauphin Y, Liang P, Wortman Vaughan J (eds) Advances in neural information processing systems. https://openreview.net/forum?id=fLnsj7fpbPI. Accessed 17 March 2022
– reference: Jinyoung C, Beom-Jin L, Byoung-Tak Z (2017) Multi-focus attention network for efficient deep reinforcement learning. In: Workshops at the thirty-first AAAI conference on artificial intelligence
– reference: Kar S, Moura MFJ, Poor HV (2013b) Distributed reinforcement learning in multi-agent networks. In: 2013 5th IEEE international workshop on computational advances in multi-sensor adaptive processing (CAMSAP). IEEE, pp 296–299
– reference: Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation
– reference: Cáp M, Novák P, Seleckỳ M, Faigl J, Jiff V. (2013) Asynchronous decentralized prioritized planning for coordination in multi-robot system. In: IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3822–3829
– reference: Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
– reference: Charles B, Joel Z L, Denis T, Tom W, Marcus W, Heinrich K, Andrew L, Simon G, Víctor V, Amir S et al (2016) Deepmind lab. arXiv:1612.03801
– reference: Foerster J, Assael IA, Freitas Nando de, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. Adv Neural Inf Process Syst:2137–2145
– reference: Shuo J (2019) Multi-Agent Reinforcement Learning Environment. https://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environmenthttps://github.com/Bigpig4396/Multi-Agent-Reinforcement-Learning-Environment Accessed 2019-07-28
– reference: LaValle SM (2006) Planning algorithms. Cambridge university press
– reference: Wei H, Nan X u, Zhang H, Zheng G, Zang X, Chen C, Zhang W, Zhu Y, Xu K, Li Z (2019b) Colight: Learning network-level cooperation for traffic signal control. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1913–1922
– reference: Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature (7540):529–533
– reference: Moerland TM, Broekens J, Jonker CM (2020) Model-based reinforcement learninga survey. arXiv:2006.16712
– reference: DingZHuangTZongqingLuLearning individually inferred communication for multi-agent cooperationAdv Neural Inf Process Syst2020332206922079
– reference: HochreiterSSchmidhuberJLong short-term memoryNeural comput1997981735178010.1162/neco.1997.9.8.1735
– reference: Leibo JZ, Zambaldi V, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, pp 464?473
– reference: Adrian K A, Kagan T (2004) Unifying temporal and structural credit assignment problems
– reference: ML2 (2021) Marlenv, multi-agent reinforcement learning environment. http://github.com/kc-ml2/marlenv. Accessed 12 March 2020
– reference: Smierzchalski R, Michalewicz Z (2005) Path planning in dynamic environments. In: Innovations in robot mobility and control. Springer, pp 135–153
– reference: Lowe R, Yi W, Tamar A, Harb J, Abbeel OpenAI P., Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6382–6393
– reference: WangHWangXHuXZhangXGuMA multi-agent reinforcement learning approach to dynamic service compositionInf Sci20163639611910.1016/j.ins.2016.05.002
– reference: Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
– reference: OroojlooyjadidANazariMSnyderLTakáčMA deep q-network for the beer game: deep reinforcement learning for inventory optimizationManuf Serv Oper Manag201700null, 0https://doi.org/10.1287/msom.2020.0939.
– reference: KroeseDPRubinsteinRYMonte carlo methodsWiley Interdiscip Rev Comput Stat201241485810.1002/wics.194
– reference: Leroy S, Laumond J-P, Siméon T (1999) Multiple path coordination for mobile robots A geometric algorithm. In: IJCAI, vol 99 pp 1118–1123
– reference: Wang J, Xu W, Gu Y, Song W, Green TC (2021) Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Beygelzimer A, Dauphin y , Liang P, Vaughan JW (eds) Advances in neural information processing systems. https://openreview.net/forum?id=hwoK62_GkiT. Accessed 23 Jan 2022
– reference: Freed B, Sartoretti G, Jiaheng H, Choset H (2020) Communication learning via backpropagation in discrete channels with unknown noise. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 7160–7168
– reference: BowlingMVelosoMMultiagent learning using a variable learning rateArtif Intell2002136221525018958190995.6807510.1016/S0004-3702(02)00121-2
– reference: Sun W, Jiang N, Krishnamurthy A, Agarwal A, Langford J (2019) Model-based rl in contextual decision processes: pac bounds and exponential improvements over model-free approaches. In: Conference on learning theory. PMLR, pp 2898–2933
– reference: Usunier N, Synnaeve G, Lin Z, Chintala S (2017) Episodic exploration for deep deterministic policies for starcraft micromanagement. In: International conference on learning representations. https://openreview.net/forum?id=r1LXit5ee. Accessed 28 July 2019
– reference: LussangeJLazarevichIBourgeois-GirondeSPalminteriSGutkinBModelling stock markets by multi-agent reinforcement learningComput Econ202157111314710.1007/s10614-020-10038-w
– reference: Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TGJ, Hung Ch-M, Torr PHS, Foerster J, Whiteson S (2019a) The StarCraft multi-agent challenge. arXiv:1902.04043
– reference: David S, Guy L, Nicolas H, Thomas D, Daan W, Martin R (2014) Deterministic policy gradient algorithms. In: Xing EP, Jebara T (eds) Proceedings of the 31st international conference on machine learning, volume 32 of proceedings of machine learning research. PMLR, Bejing, 22–24 Jun, pages 387–395, http://proceedings.mlr.press/v32/silver14.html. Accessed 28 July 2019
– reference: SethAShermanMReinboltJADelpSLOpensim: a musculoskeletal modeling and simulation framework for in silico investigations and exchangeProcedia Iutam2011221223210.1016/j.piutam.2011.04.021
– reference: KoberJAndrew BagnellJJan PReinforcement learning in robotics a surveyInt J Robot Res201332111238127410.1177/0278364913495721
– reference: SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature2017550767635410.1038/nature24270
– reference: LiuYChenYJiangTDynamic selective maintenance optimization for multi-state systems over a finite horizon: a deep reinforcement learning approachEur J Oper Res2020283116618140499851431.9005310.1016/j.ejor.2019.10.049
– reference: Marc B, Peng W (2019) Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. In: Reinforcement learning for real life workshop in the 36th international conference on machine learning, long beach
– reference: Suarez J, Du Y, Isola P, Mordatch I (2019) Neural mmo: a massively multiagent game environment for training and evaluating intelligent agents. arXiv:1903.00784
– reference: ZhangKKoppelAZhuHBasarTGlobal convergence of policy gradient methods to (almost) locally optimal policiesSIAM J Control Optim20205863586361241829001451.9337910.1137/19M1288012
– reference: Petersen K (2012) Termes: an autonomous robotic system for three-dimensional collective construction. Robot: Sci Syst VII, pp 257
– reference: MatignonLLaurentGJFort-PiatNLIndependent reinforcement learners in cooperative markov games: a survey regarding coordination problemsKnowl Eng Rev20122113110.1017/S0269888912000057
– reference: SuttonRSRupam MahmoodAWhiteMAn emphatic approach to the problem of off-policy temporal-difference learningJ Mach Learn Res20161712603263135170961360.68712
– reference: Eric J, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax. In: ICLR
– reference: Norén JFW (2020) Derk gym environment. https://gym.derkgame.com. Accessed 01 Sept 2021
– reference: SartorettiGKerrJShiYWagnerGKumarTKSKoenigSChosetHPrimal: pathfinding via reinforcement and imitation multi-agent learningIEEE Robot Auto Lett2019432378238510.1109/LRA.2019.2903261
– reference: Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
– reference: Zhang S, Sutton RS (2017) A deeper look at experience replay. arXiv:1712.01275
– reference: AndriotisCPPapakonstantinouKGManaging engineering systems with large state and action spaces through deep reinforcement learningReliab Eng Syst201919110648310.1016/j.ress.2019.04.036
– reference: Wei H, Chen C, Zheng G, Kan W, Gayah V, Xu K, Li Z (2019a) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, KDD ’19, pp 1290–1298
– reference: Guillaume S, Yue W, William P, TK SK, Sven K, Howie C (2019b) Distributed reinforcement learning for multi-robot decentralized collective construction. In: Distributed Autonomous Robotic Systems. Springer, pp 35–49
– reference: StonePVelosoMMultiagent systems: a survey from a machine learning perspectiveAuton Robot20008334538310.1023/A:1008942012299
– reference: Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Gonzalez J, Goldberg K, Stoica I (2017) Ray RLlib: a composable and scalable reinforcement learning library. In: Deep reinforcement learning symposium (DeepRL @ NeurIPS)
– reference: GarcıaJFernándezFA comprehensive survey on safe reinforcement learningJ Mach Learn Res20151611437148034177871351.68209
– reference: Greg B, Vicki C, Ludwig P, Jonas S, John S, Jie T, Wojciech Zaremba (2016) Openai gym
– reference: Kulkarni TD, Narasimhan K, Saeedi A, Josh T. (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in neural information processing systems, pages 3675–3683
– reference: Ng AY, Harada D, Russell SJ (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the sixteenth international conference on machine learning, ISBN 1-55860-612-2. ICML ’99, pp 278–287. http://dl.acm.org/citation.cfm?id=645528.657613. Morgan Kaufmann Publishers Inc., CA. Accessed 28 July 2019.
– reference: Mao H, Zhang Z, Xiao Z, Gong Z (2019) Modelling the dynamic joint policy of teammates with attention multi-agent ddpg. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. International foundation for autonomous agents and multiagent systems, pp 1108–1116
– reference: Makar R, Mahadevan S, Ghavamzadeh M (2001) Hierarchical multi-agent reinforcement learning. In: Proceedings of the fifth international conference on autonomous agents. ACM, pp 246–253
– reference: PrashanthLABhatnagarSReinforcement learning with function approximation for traffic signal controlIEEE Trans Intell Transp Syst2010122412421
– reference: Yu H (2015) On convergence of emphatic temporal-difference learning. In: Conference on learning theory, pp 1724–1751
– reference: PadullaparthiVRNagarathinamSVasanAMenonVSudarsanamDFalcon-farm level control for wind turbines using multi-agent deep reinforcement learningRenew Energy202218144545610.1016/j.renene.2021.09.023
– reference: Ian X (2018) A distributed reinforcement learning solution with knowledge transfer capability for a bike rebalancing problem. arXiv:1810.04058
– reference: Zuo X (2018) Mazelab: a customizable framework to create maze and gridworld environments. https://github.com/zuoxingdong/mazelab. Accessed 28 July 2019
– reference: SongYWojcickiALukasiewiczTWangJAryanAXuZXuMDingZWuLArena: a general evaluation platform and building toolkit for multi-agent intelligenceProc AAAI Conf Artif Intell2020340572537260https://doi.org/10.1609/aaai.v34i05.6216. https://ojs.aaai.org/index.php/AAAI/article/view/6216
– reference: Sorensen J, Mikkelsen R, Henningson D, Ivanell S, Sarmast S, Andersen S (2015) Simulation of wind turbine wakes using the actuator line technique. Philosophical Trans Series Math Phys Eng Sci. vol 373(02). https://doi.org/10.1098/rsta.20140071
– reference: Monireh A, Nasser M, Ana LC B (2011) Traffic light control in non-stationary environments based on multi agent q-learning. In: 2011 14th international IEEE conference on intelligent transportation systems (ITSC), pp 1580–1585. https://doi.org/10.1109/ITSC.2011.6083114
– reference: Bouton M, Farooq H, Forgeat J, Bothe S, Shirazipour M, Karlsson P (2021) Coordinated reinforcement learning for optimizing mobile networks, arXiv:2109.15175
– reference: Mohanty S, Nygren E, Laurent F, Schneider M, Scheller C, Bhattacharya N, Watson J, Egli A, Eichenberger C, Baumberger C et al (2020) Flatland-rl: Multi-agent reinforcement learning on trains. arXiv:2012.05893
– reference: TampuuAMatiisenTKodeljaDKuzovkinIKorjusKAruJAruJVicenteRMultiagent cooperation and competition with deep reinforcement learningPlos one2017124e017239510.1371/journal.pone.0172395
– reference: Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 39–48
– reference: BorkarVSMeynSPThe o.d.e. method for convergence of stochastic approximation and reinforcement learningSIAM J Control Optim200038244746917411480990.6207110.1137/S0363012997331639
– reference: Weiß G (1995) Distributed reinforcement learning. In: Luc Steels (ed) The Biology and technology of intelligent autonomous agents, pp 415–428. Berlin, Heidelberg. Springer Berlin Heidelberg
– reference: Panerati J, Zheng H, Zhou SQ, Xu J, Prorok A, Schoellig AP (2021) Learning to fly–a gym environment with pybullet physics for reinforcement learning of multiagent quadcopter control. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 7512– 7519
– reference: Ma H, Harabor D, Stuckey PJ, Li J, Koenig S (2019) Searching with consistent prioritization for multi-agent path finding. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7643–7650
– reference: Hestenes MR, Stiefel E et al (1952) Methods of conjugate gradients for solving linear systems. NBS Washington, DC, vol 49
– reference: Fuji T, Ito K, Matsumoto K, Yano K (2018) Deep multi-agent reinforcement learning using dnn-weight evolution to optimize supply chain performance. In: Proceedings of the 51st Hawaii international conference on system sciences, vol 8
– reference: Max B, Guni S, Roni S, Ariel F (2014) Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In: Seventh annual symposium on combinatorial search. Citeseer
– reference: Lipton ZC, Gao J, Li L, Li X, Ahmed F, Li D (2016) Efficient exploration for dialog policy learning with deep bbq networks & replay buffer spiking. coRR abs/1608.05081
– reference: WangXWangHQiCMulti-agent reinforcement learning based maintenance policy for a resource constrained flow line systemJ Intell Manuf201627232533310.1007/s10845-013-0864-5
– reference: Berg JVD, Guy SJ, Lin M, Manocha D (2011) Reciprocal n-body collision avoidance. In: Robotics research. Springer pp 3–19
– reference: Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR. org, pp 2681–2690
– reference: BellemareMGNaddafYVenessJBowlingMThe arcade learning environment: an evaluation platform for general agentsJ Artif Intell Res20134725327910.1613/jair.3912
– reference: YousefiNTsianikasSCoitDWReinforcement learning for dynamic condition-based maintenance of a system with individually repairable componentsQual Eng202032338840810.1080/08982112.2020.1766692
– reference: Macua SV, Tukiainen A, Hernández DG-O, Baldazo D, de Cote EM, Zazo S (2018) Diff-dac: Distributed actor-critic for average multitask deep reinforcement learning. In: Adaptive learning agents (ALA) conference
– reference: LorenzoPDScutari GNext: in-network nonconvex optimizationIEEE Trans Signal Inf Process Over Netw201622120136355596210.1109/TSIPN.2016.2524588
– reference: Arthur J, Berges V-P, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627
– reference: Donghwan L, Hyung-Jin Y, Naira H (2018) Primal-dual algorithm for distributed reinforcement learning: distributed GTD. 2018 IEEE Conf Decis Control (CDC):1967–1972
– reference: Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, Yi Y (2019) Learning to schedule communication in multi-agent reinforcement learning. In: International conference on learning representations. https://openreview.net/forum?id=SJxu5iR9KQ. Accessed 07 March 2020
– reference: Mordatch I, Abbeel P (2018a) Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
– reference: Sergio ValcarcelMJianshu CSantiago ZAli H SDistributed policy evaluation under multiple behavior strategiesIEEE Trans Auto Cont20156051260127433514101360.6871410.1109/TAC.2014.2368731https://doi.org/10.1109/TAC.2014.2368731
– reference: Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in neural information processing systems, pp 4190– 4203
– reference: ZhangHJiangHLuoYXiaoGData-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning methodIEEE Trans Ind Electron20176454091410010.1109/TIE.2016.2542134https://doi.org/10.1109/TIE.2016.2542134
– reference: WilliamsRSimple statistical gradient-following algorithms for connectionist reinforcement learningMach Learn199283-42292560772.6807610.1007/BF00992696
– reference: SilvaFLDCostaAHRA survey on transfer learning for multiagent reinforcement learning systemsJ Artif Intell Res20196464570339325591489.6822110.1613/jair.1.11396
– reference: Pan L, Cai Q, Meng Q, Chen W, Huang L (2020) Reinforcement learning with dynamic boltzmann softmax updates. In: Christian Bessiere (ed) Proceedings of the twenty-ninth international joint conference on artificial intelligence. International joint conferences on artificial intelligence organization, IJCAI-20, Main track, pp 1992–1998. https://doi.org/10.24963/ijcai.2020/276
– reference: Jan B, Steven Morad JG, Qingbiao L, Amanda P (2021) A framework for real-world multi-robot systems running decentralized gnn-based policies. arXiv:2111.01777
– reference: Lauer M, Riedmiller M (2000) An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning. Citeseer
– reference: Raghuram Bharadwaj D, D Sai Koti R, Prabuchandran KJ, Shalabh B. (2019) Actor-critic algorithms for constrained multi-agent reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. Richland, SC, AAMAS?19. International foundation for autonomous agents and multiagent systems, pp 1931–1933
– reference: Son K, Kim D, Kang WJ, Hostallero ED, Yi Y (2019) Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 31st international conference on machine learning, proceedings of machine learning research. PMLR
– reference: William F, Prajit R, Rishabh A, Yoshua B, Hugo L, Mark R, Will D (2020) Revisiting fundamentals of experience replay. In: International conference on machine learning. PMLR, pp 3061–3071
– reference: Zhang C, Lesser V, Shenoy P (2009) A multi-agent learning approach to online distributed resource allocation. In: Twenty-first international joint conference on artificial intelligence
– reference: SchmidtMRouxNLBachFMinimizing finite sums with the stochastic average gradientMath Program20171621-28311236129331358.9007310.1007/s10107-016-1030-6
– reference: Lazaric A (2012) Transfer in reinforcement learning: a framework and a survey. In: Reinforcement learning. Springer, pp 143–173
– reference: SuttleWYangZZhangKWangZBaşarTLiuJA multi-agent off-policy actor-critic algorithm for distributed reinforcement learningIFAC-PapersOnLine. ISSN 2405-8963. 21th IFAC World Congress202053215491554https://doi.org/10.1016/j.ifacol.2020.12.2021. https://www.sciencedirect.com/science/article/pii/S2405896320326562
– reference: Sukhbaatar S, Szlam A, Synnaeve G, Chintala S, Fergus R (2015) Mazebase: a sandbox for learning from games. arXiv:1511.07401
– reference: Yang Z, Zhang K, Hong M (2018b) Tamer başar. A finite sample analysis of the actor-critic algorithm. In: IEEE Conference on decision and control (CDC). IEEE, pp 2759–2764
– reference: Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018a) Mean field multi-agent reinforcement learning. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, vol 80. of Proceedings of machine learning research, pp 5571–5580. Stockholmsmassan, Stockholm Sweden, 10–15 Jul PMLR
– reference: Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. Adv Neural Inf Process Syst. Deep learning workshop
– reference: LinL-JSelf-improving reactive agents based on reinforcement learning, planning and teachingMach Learn199283-429332110.1007/BF00992699
– reference: Liu B, Cai Q, Yang Z, Wang Z (2019) Neural trust region/proximal policy optimization attains globally optimal policy. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems. Curran Associates Inc., volume 32. https://proceedings.neurips.cc/paper/2019/file/227e072d131ba77451d8f27ab9afdfb7-Paper.pdf. Accessed 12 Apr 2020
– reference: Sam D, Daniel K (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th international conference on autonomous agents and multiagent systems-volume 1. International foundation for autonomous agents and multiagent systems, pp 225–232
– reference: Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337
– reference: NguyenTTNguyenNDNahavandiSDeep reinforcement learning for multiagent systems: a review of challenges, solutions, and applicationsIEEE Trans Cybern20205093826383910.1109/TCYB.2020.2977374
– reference: PennesiPPaschalidisICA distributed actor-critic algorithm and applications to mobile sensor network coordination problemsIEEE Trans Auto Cont201055249249726044301368.9002610.1109/TAC.2009.2037462ISSN 0018-9286. https://doi.org/10.1109/TAC.2009.2037462
– reference: LeeJParkJJangminOLeeJHongEA multiagent approach to q-learning for daily stock tradingIEEE Trans Syst Man Cybern A: Syst Hum200737686487710.1109/TSMCA.2007.904825
– reference: Bei P, Tabish R, Christian Schroeder de W, Pierre-Alexandre K, Philip T, Wendelin B, Shimon W (2021) Facmac: Factored multi-agent centralised policy gradients. Adv Neural Inf Process Syst, vol 34
– reference: Wang RE, Everett M, How JP (2019b) R-maddpg for partially observable environments and limited communication. ICML 2019 Workshop RL4reallife
– reference: Wang Y, Han B, Wang T, Dong H, Zhang C (2020c) Dop: Off-policy multi-agent decomposed policy gradients. In: International conference on learning representations
– reference: FitouhiM-CNourelfathMGershwinSBPerformance evaluation of a two-machine line with a finite buffer and condition-based maintenanceReliab Eng Syst2017166617210.1016/j.ress.2017.03.034
– reference: Kyuree AJinkyoo PCooperative zone-based rebalancing of idle overhead hoist transportations using multi-agent reinforcement learning with graph representation learningIISE Trans202100117https://doi.org/10.1080/24725854.2020.1851823 https://doi.org/10.1080/24725854.2020.1851823
– reference: Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning. PMLR, pp 1861–1870
– reference: WatkinsCJDayanPQ-learningMach Learn199283-42792920773.6806210.1007/BF00992698
– reference: ChuTWangJCodecàLLiZMulti-agent deep reinforcement learning for large-scale traffic signal controlIEEE Trans Intell Transp Syst20192131086109510.1109/TITS.2019.2901791
– reference: Jakob NF, Gregory F, Triantafyllos A, Nantas N, Shimon W (2018) Counterfactual multi-agent policy gradients. In: Thirty-second AAAI conference on artificial intelligence
– reference: Zhang C, Li X, Hao J, Chen S, Tuyls K, Xue W, Feng Z (2018a) Scc-rfmq learning in cooperative markov games with continuous actions. In: Proceedings of the 17th international Conference on Autonomous Agents and MultiAgent systems. International foundation for autonomous agents and Multiagent systems, pp 2162–2164
– reference: Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Elibolm M, Yang Z, Paul W, Jordan M et al (2018) Ray: a distributed framework for emerging fAIg applications. In: 13th {USENIX} symposium on operating systems design and implementation (fOSDIg 18), pp 561–577
– reference: Yuxi L (2017) Deep reinforcement learning: an overview. arXiv:1701.07274
– reference: Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 2085–2087. International foundation for autonomous agents and multiagent systems
– reference: Wei H, Zheng G, Gayah V, Li Z (2019c) A survey on traffic signal control methods. arXiv:1904.08117
– reference: WangSWanJZhangDLiDZhangCTowards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordinationComput Netw201610115816810.1016/j.comnet.2015.12.017
– reference: CuiRBoGJiGPareto-optimal coordination of multiple robots with safety guaranteesAuton Robot201232318920510.1007/s10514-011-9265-9
– reference: Jiang S, Amato C (2021) Multi-agent reinforcement learning with directed exploration and selective memory reuse. In: Proceedings of the 36th annual ACM symposium on applied computing, pp 777–784
– reference: Prabuchandran KJ, Hemanth Kumar AN, Bhatnagar S (2014) Multi-agent reinforcement learning for traffic signal control. In: 17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, pp 2529–2534
– reference: Lanctot M, Lockhart E, Lespiau J-B, Zambaldi V, Upadhyay S, Pérolat J, Srinivasan S, Timbers F, Tuyls K, Omidshafiei S et al (2019) Openspiel: a framework for reinforcement learning in games. arXiv:1908.09453
– reference: MirhoseiniAGoldieAYazganMJiangJSonghoriEWangSLeeY-JJohnsonEPathakONaziAA graph placement methodology for fast chip designNature2021594786220721210.1038/s41586-021-03544-w
– reference: Gabel T, Riedmiller M (2007) On a successful application of multi-agent reinforcement learning to operations research benchmarks. In: 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning. IEEE, pp 68–75
– reference: Sanchez G, Latombe J-C (2002) Using a prm planner to compare centralized and decoupled planning for multi-robot systems. In: Proceedings 2002 IEEE international conference on robotics and automation (Cat. No. 02CH37292), vol 2, pp 2112–2119
– reference: Nazari M, Oroojlooy A, Snyder L, Takác M. (2018) Reinforcement learning for solving the vehicle routing problem
– reference: Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential monte carlo methods. In: Advances in neural information processing systems, pp 833–840
– reference: Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games. IEEE, Santorini, The best paper award, pp 341–348
– reference: Ryu H, Shin H, Park J (2018) Multi-agent actor-critic with generative cooperative policy network. arXiv:1810.09206
– ident: 4105_CR241
– ident: 4105_CR178
– ident: 4105_CR212
– volume: 9
  start-page: 1735
  issue: 8
  year: 1997
  ident: 4105_CR72
  publication-title: Neural comput
  doi: 10.1162/neco.1997.9.8.1735
– volume: 47
  start-page: 253
  year: 2013
  ident: 4105_CR16
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.3912
– volume: 5
  start-page: 1245
  issue: 3
  year: 2017
  ident: 4105_CR161
  publication-title: IEEE Trans Cont Netw Syst
– ident: 4105_CR155
– volume: 8
  start-page: 345
  issue: 3
  year: 2000
  ident: 4105_CR192
  publication-title: Auton Robot
  doi: 10.1023/A:1008942012299
– ident: 4105_CR132
– volume: 66
  start-page: 1497
  issue: 4
  year: 2021
  ident: 4105_CR32
  publication-title: IEEE Trans Auto Cont
  doi: 10.1109/TAC.2020.2995814
– volume: 33
  start-page: 10088
  year: 2020
  ident: 4105_CR163
  publication-title: Adv Neural Inf Process Syst
– volume: 30
  start-page: 403
  issue: 3
  year: 2016
  ident: 4105_CR53
  publication-title: Auton Agent Multi-Agent Syst
  doi: 10.1007/s10458-015-9292-6
– volume: 27
  start-page: 325
  issue: 2
  year: 2016
  ident: 4105_CR228
  publication-title: J Intell Manuf
  doi: 10.1007/s10845-013-0864-5
– ident: 4105_CR38
– ident: 4105_CR15
– volume: 8
  start-page: 229
  issue: 3-4
  year: 1992
  ident: 4105_CR236
  publication-title: Mach Learn
  doi: 10.1007/BF00992696
– ident: 4105_CR73
– volume: 594
  start-page: 207
  issue: 7862
  year: 2021
  ident: 4105_CR133
  publication-title: Nature
  doi: 10.1038/s41586-021-03544-w
– ident: 4105_CR170
  doi: 10.1007/978-3-030-05816-6_3
– volume: 34
  start-page: 7253
  issue: 05
  year: 2020
  ident: 4105_CR189
  publication-title: Proc AAAI Conf Artif Intell
  doi: 10.1609/aaai.v34i05.6216
– volume: 2
  start-page: 212
  year: 2011
  ident: 4105_CR177
  publication-title: Procedia Iutam
  doi: 10.1016/j.piutam.2011.04.021
– volume: 61
  start-page: 1848
  issue: 7
  year: 2013
  ident: 4105_CR90
  publication-title: IEEE Trans Signal Process
  doi: 10.1109/TSP.2013.2241057
– ident: 4105_CR67
– ident: 4105_CR187
  doi: 10.1007/10992388_4
– ident: 4105_CR21
– ident: 4105_CR137
– volume: 363
  start-page: 96
  year: 2016
  ident: 4105_CR222
  publication-title: Inf Sci
  doi: 10.1016/j.ins.2016.05.002
– ident: 4105_CR195
– ident: 4105_CR229
– ident: 4105_CR10
– ident: 4105_CR60
  doi: 10.24251/HICSS.2018.157
– ident: 4105_CR252
– ident: 4105_CR200
– volume: 4
  start-page: 128
  issue: 2
  year: 2010
  ident: 4105_CR8
  publication-title: IET Intell Transp Syst
  doi: 10.1049/iet-its.2009.0070
– ident: 4105_CR56
– ident: 4105_CR140
  doi: 10.1609/aaai.v32i1.11492
– volume: 33
  start-page: 22069
  year: 2020
  ident: 4105_CR51
  publication-title: Adv Neural Inf Process Syst
– ident: 4105_CR74
– ident: 4105_CR218
– ident: 4105_CR91
  doi: 10.1109/CAMSAP.2013.6714066
– volume: 55
  start-page: 492
  issue: 2
  year: 2010
  ident: 4105_CR157
  publication-title: IEEE Trans Auto Cont
  doi: 10.1109/TAC.2009.2037462
– ident: 4105_CR68
– ident: 4105_CR80
– ident: 4105_CR224
– ident: 4105_CR254
  doi: 10.1145/3308558.3314139
– ident: 4105_CR213
  doi: 10.1007/978-3-642-19457-3_1
– ident: 4105_CR230
– ident: 4105_CR167
– ident: 4105_CR196
– volume: 283
  start-page: 166
  issue: 1
  year: 2020
  ident: 4105_CR120
  publication-title: Eur J Oper Res
  doi: 10.1016/j.ejor.2019.10.049
– ident: 4105_CR148
– ident: 4105_CR216
  doi: 10.1007/978-3-642-00644-9_33
– ident: 4105_CR31
  doi: 10.1109/IROS.2013.6696903
– ident: 4105_CR108
– ident: 4105_CR156
– volume: 50
  start-page: 3826
  issue: 9
  year: 2020
  ident: 4105_CR147
  publication-title: IEEE Trans Cybern
  doi: 10.1109/TCYB.2020.2977374
– ident: 4105_CR2
– ident: 4105_CR92
  doi: 10.1109/SMCIA.2008.5045926
– ident: 4105_CR125
  doi: 10.1609/aaai.v33i01.33017643
– ident: 4105_CR40
– ident: 4105_CR37
– volume: 38
  start-page: 156
  issue: 2
  year: 2008
  ident: 4105_CR29
  publication-title: IEEE Trans Syst, Man, Cybern, Part C (Appl Rev)
  doi: 10.1109/TSMCC.2007.913919
– ident: 4105_CR59
  doi: 10.1609/aaai.v34i05.6205
– ident: 4105_CR79
– ident: 4105_CR103
– ident: 4105_CR162
– ident: 4105_CR3
– ident: 4105_CR190
  doi: 10.1098/rsta.20140071
– volume: 58
  start-page: 3586
  issue: 6
  year: 2020
  ident: 4105_CR257
  publication-title: SIAM J Control Optim
  doi: 10.1137/19M1288012
– ident: 4105_CR260
– ident: 4105_CR20
– ident: 4105_CR171
– ident: 4105_CR235
  doi: 10.1007/978-3-642-79629-6_18
– ident: 4105_CR57
– volume: 0
  start-page: 1
  issue: 0
  year: 2021
  ident: 4105_CR4
  publication-title: IISE Trans
  doi: 10.1080/24725854.2020.1851823 10.1080/24725854.2020.1851823
– ident: 4105_CR86
– ident: 4105_CR30
  doi: 10.1007/978-3-642-14435-6_7
– ident: 4105_CR188
– ident: 4105_CR104
– ident: 4105_CR165
– ident: 4105_CR9
  doi: 10.1016/j.artint.2021.103500
– ident: 4105_CR127
– ident: 4105_CR46
– ident: 4105_CR242
– ident: 4105_CR98
– volume: 160
  start-page: 113701
  year: 2020
  ident: 4105_CR78
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2020.113701
– volume: 17
  start-page: 2603
  issue: 1
  year: 2016
  ident: 4105_CR205
  publication-title: J Mach Learn Res
– ident: 4105_CR154
– ident: 4105_CR25
– ident: 4105_CR116
– ident: 4105_CR1
  doi: 10.1109/ITSC.2011.6083114
– ident: 4105_CR204
  doi: 10.1145/1553374.1553501
– ident: 4105_CR124
  doi: 10.1609/aaai.v30i1.10409
– ident: 4105_CR225
– ident: 4105_CR199
– ident: 4105_CR210
– volume: 15
  start-page: 10026
  issue: 5
  year: 2015
  ident: 4105_CR246
  publication-title: Sensors
  doi: 10.3390/s150510026
– ident: 4105_CR62
  doi: 10.1109/ADPRL.2007.368171
– ident: 4105_CR85
  doi: 10.1145/3412841.3441953
– volume: 219
  start-page: 1
  year: 2015
  ident: 4105_CR219
  publication-title: Artif Intell
  doi: 10.1016/j.artint.2014.11.001
– ident: 4105_CR255
  doi: 10.1109/CDC.2018.8619581
– ident: 4105_CR14
– ident: 4105_CR114
  doi: 10.1145/3219819.3219993
– volume: 4
  start-page: 2378
  issue: 3
  year: 2019
  ident: 4105_CR169
  publication-title: IEEE Robot Auto Lett
  doi: 10.1109/LRA.2019.2903261
– volume: 6
  start-page: 5533
  issue: 3
  year: 2021
  ident: 4105_CR109
  publication-title: IEEE Robot Autom Lett
  doi: 10.1109/LRA.2021.3077863
– ident: 4105_CR243
– ident: 4105_CR220
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 4105_CR183
  publication-title: Nature
  doi: 10.1038/nature16961
– ident: 4105_CR97
– ident: 4105_CR138
– ident: 4105_CR198
  doi: 10.1007/978-3-319-71679-4
– ident: 4105_CR71
  doi: 10.6028/jres.049.044
– ident: 4105_CR172
– ident: 4105_CR41
– volume: 32
  start-page: 1238
  issue: 11
  year: 2013
  ident: 4105_CR95
  publication-title: Int J Robot Res
  doi: 10.1177/0278364913495721
– ident: 4105_CR121
– ident: 4105_CR36
– volume: 57
  start-page: 113
  issue: 1
  year: 2021
  ident: 4105_CR122
  publication-title: Comput Econ
  doi: 10.1007/s10614-020-10038-w
– ident: 4105_CR77
  doi: 10.1109/COASE.2019.8843338
– ident: 4105_CR128
  doi: 10.1145/375735.376302
– volume: 280
  start-page: 103216
  year: 2020
  ident: 4105_CR13
  publication-title: Artif Intell
  doi: 10.1016/j.artint.2019.103216
– volume: 5
  start-page: 6932
  issue: 4
  year: 2020
  ident: 4105_CR221
  publication-title: IEEE Robotics and Automation Letters
  doi: 10.1109/LRA.2020.3026638
– volume: 32
  start-page: 9927
  year: 2019
  ident: 4105_CR174
  publication-title: Adv Neural Inf Process Syst
– ident: 4105_CR152
  doi: 10.24963/ijcai.2020/276
– ident: 4105_CR193
  doi: 10.1609/aaai.v35i13.17353
– volume: 191
  start-page: 106483
  year: 2019
  ident: 4105_CR7
  publication-title: Reliab Eng Syst
  doi: 10.1016/j.ress.2019.04.036
– ident: 4105_CR110
– volume: 192
  start-page: 116323
  year: 2022
  ident: 4105_CR194
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2021.116323
– ident: 4105_CR75
– ident: 4105_CR69
– ident: 4105_CR81
– ident: 4105_CR209
– ident: 4105_CR58
– ident: 4105_CR149
– ident: 4105_CR166
– ident: 4105_CR237
– volume: 60
  start-page: 4289
  issue: 8
  year: 2012
  ident: 4105_CR34
  publication-title: IEEE Trans Signal Process
  doi: 10.1109/TSP.2012.2198470
– ident: 4105_CR47
– volume: 101
  start-page: 158
  year: 2016
  ident: 4105_CR227
  publication-title: Comput Netw
  doi: 10.1016/j.comnet.2015.12.017
– ident: 4105_CR99
– ident: 4105_CR141
  doi: 10.1609/aaai.v32i1.11492
– volume: 136
  start-page: 215
  issue: 2
  year: 2002
  ident: 4105_CR26
  publication-title: Artif Intell
  doi: 10.1016/S0004-3702(02)00121-2
– ident: 4105_CR117
– ident: 4105_CR249
– volume: 58
  start-page: 391
  issue: 2
  year: 2012
  ident: 4105_CR22
  publication-title: IEEE Trans Autom Control
  doi: 10.1109/TAC.2012.2209984
– ident: 4105_CR146
– ident: 4105_CR6
  doi: 10.1109/CVPR.2016.12
– ident: 4105_CR82
– volume: 12
  start-page: e0172395
  issue: 4
  year: 2017
  ident: 4105_CR206
  publication-title: Plos one
  doi: 10.1371/journal.pone.0172395
– ident: 4105_CR226
– ident: 4105_CR100
– ident: 4105_CR143
  doi: 10.1109/IROS40897.2019.8968129 10.1109/IROS40897.2019.8968129
– volume: 550
  start-page: 354
  issue: 7676
  year: 2017
  ident: 4105_CR184
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: 4105_CR130
  doi: 10.1109/IROS.2007.4399095
– ident: 4105_CR203
– ident: 4105_CR93
  doi: 10.1109/CIG.2016.7860433
– ident: 4105_CR123
– volume-title: Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp 321–384
  year: 2021
  ident: 4105_CR258
  doi: 10.1007/978-3-030-60990-0_12
– ident: 4105_CR191
  doi: 10.1109/ACC.2016.7524910
– ident: 4105_CR215
– volume: 2
  start-page: 120
  issue: 2
  year: 2016
  ident: 4105_CR49
  publication-title: IEEE Trans Signal Inf Process Over Netw
  doi: 10.1109/TSIPN.2016.2524588
– ident: 4105_CR94
– volume: 32
  start-page: 189
  issue: 3
  year: 2012
  ident: 4105_CR42
  publication-title: Auton Robot
  doi: 10.1007/s10514-011-9265-9
– ident: 4105_CR175
– ident: 4105_CR112
– ident: 4105_CR245
  doi: 10.1109/CDC.2018.8619440
– volume: 3
  start-page: 8
  issue: 1
  year: 2018
  ident: 4105_CR238
  publication-title: CAAI Trans Intell Technol
  doi: 10.1049/trit.2018.0001
– ident: 4105_CR87
– volume: 64
  start-page: 645
  year: 2019
  ident: 4105_CR43
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.1.11396
– volume: 8
  start-page: 293
  issue: 3-4
  year: 1992
  ident: 4105_CR115
  publication-title: Mach Learn
  doi: 10.1007/BF00992699
– ident: 4105_CR164
– volume: 131
  start-page: 148
  year: 2019
  ident: 4105_CR181
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2019.04.056
– ident: 4105_CR105
– ident: 4105_CR144
  doi: 10.1109/IROS40897.2019.8968129
– volume: 53
  start-page: 1549
  issue: 2
  year: 2020
  ident: 4105_CR201
  publication-title: IFAC-PapersOnLine. ISSN 2405-8963. 21th IFAC World Congress
  doi: 10.1016/j.ifacol.2020.12.2021
– volume: 219
  start-page: 40
  year: 2015
  ident: 4105_CR179
  publication-title: Artif Intell
  doi: 10.1016/j.artint.2014.11.006
– ident: 4105_CR182
– ident: 4105_CR111
– ident: 4105_CR5
– ident: 4105_CR101
  doi: 10.1017/CBO9780511546877
– volume: 8
  start-page: 279
  issue: 3-4
  year: 1992
  ident: 4105_CR231
  publication-title: Mach Learn
  doi: 10.1007/BF00992698
– volume: 162
  start-page: 83
  issue: 1-2
  year: 2017
  ident: 4105_CR173
  publication-title: Math Program
  doi: 10.1007/s10107-016-1030-6
– ident: 4105_CR211
  doi: 10.1109/IROS.2012.6386109
– ident: 4105_CR208
– ident: 4105_CR176
– ident: 4105_CR134
– ident: 4105_CR102
  doi: 10.1007/978-3-642-27645-3_5
– ident: 4105_CR88
– volume-title: Neuro-Dynamic Programming
  year: 1996
  ident: 4105_CR19
– ident: 4105_CR214
  doi: 10.1609/aaai.v30i1.10295
– volume: 33
  start-page: 750
  issue: 6
  year: 2019
  ident: 4105_CR70
  publication-title: Auton Agent Multi-Agent Syst
  doi: 10.1007/s10458-019-09421-1
– ident: 4105_CR48
– volume: 121
  start-page: 102861
  year: 2020
  ident: 4105_CR259
  publication-title: Transp Res C: Emerg Technol
  doi: 10.1016/j.trc.2020.102861
– ident: 4105_CR139
– ident: 4105_CR244
– ident: 4105_CR23
– volume: 60
  start-page: 1260
  issue: 5
  year: 2015
  ident: 4105_CR126
  publication-title: IEEE Trans Auto Cont
  doi: 10.1109/TAC.2014.2368731
– volume: 1
  start-page: 100020
  year: 2019
  ident: 4105_CR66
  publication-title: Transp Res Interdiscip Perspect
– ident: 4105_CR145
– ident: 4105_CR12
– ident: 4105_CR250
– ident: 4105_CR63
  doi: 10.1145/3302509.3311053
– volume: 4
  start-page: 48
  issue: 1
  year: 2012
  ident: 4105_CR96
  publication-title: Wiley Interdiscip Rev Comput Stat
  doi: 10.1002/wics.194
– ident: 4105_CR202
– ident: 4105_CR54
– volume: 0
  start-page: null, 0
  issue: 0
  year: 2017
  ident: 4105_CR150
  publication-title: Manuf Serv Oper Manag
  doi: 10.1287/msom.2020.0939.
– volume: 166
  start-page: 61
  year: 2017
  ident: 4105_CR55
  publication-title: Reliab Eng Syst
  doi: 10.1016/j.ress.2017.03.034
– ident: 4105_CR136
  doi: 10.1038/nature14236
– volume: 21
  start-page: 1086
  issue: 3
  year: 2019
  ident: 4105_CR39
  publication-title: IEEE Trans Intell Transp Syst
  doi: 10.1109/TITS.2019.2901791
– ident: 4105_CR28
– ident: 4105_CR113
– volume: 27
  start-page: 819
  issue: 4
  year: 2002
  ident: 4105_CR18
  publication-title: Math Oper Res
  doi: 10.1287/moor.27.4.819.297
– ident: 4105_CR158
  doi: 10.15607/RSS.2011.VII.035
– ident: 4105_CR11
– ident: 4105_CR251
– volume: 2
  start-page: 1
  issue: 1
  year: 2012
  ident: 4105_CR131
  publication-title: Knowl Eng Rev
  doi: 10.1017/S0269888912000057
– volume: 16
  start-page: 1437
  issue: 1
  year: 2015
  ident: 4105_CR64
  publication-title: J Mach Learn Res
– volume: 37
  start-page: 864
  issue: 6
  year: 2007
  ident: 4105_CR106
  publication-title: IEEE Trans Syst Man Cybern A: Syst Hum
  doi: 10.1109/TSMCA.2007.904825
– ident: 4105_CR17
  doi: 10.1512/iumj.1957.6.56038
– volume: 69
  start-page: 389
  issue: 1
  year: 2020
  ident: 4105_CR52
  publication-title: CIRP Ann
  doi: 10.1016/j.cirp.2020.04.005
– ident: 4105_CR142
– ident: 4105_CR180
– ident: 4105_CR247
  doi: 10.1109/ICASSP.2018.8461739
– ident: 4105_CR261
  doi: 10.1109/CDC40024.2019.9029969
– ident: 4105_CR35
  doi: 10.1109/TR.2020.3044596
– ident: 4105_CR185
– volume: 50
  start-page: 6918
  issue: 1
  year: 2017
  ident: 4105_CR65
  publication-title: IFAC-PapersOnLine
  doi: 10.1016/j.ifacol.2017.08.1217
– ident: 4105_CR33
  doi: 10.1609/aaai.v34i04.5744
– volume: 32
  start-page: 388
  issue: 3
  year: 2020
  ident: 4105_CR248
  publication-title: Qual Eng
  doi: 10.1080/08982112.2020.1766692
– ident: 4105_CR263
– ident: 4105_CR262
  doi: 10.1145/3357384.3357900
– ident: 4105_CR83
– ident: 4105_CR168
– ident: 4105_CR256
– ident: 4105_CR233
  doi: 10.1145/3357384.3357902
– ident: 4105_CR61
– ident: 4105_CR84
– volume: 181
  start-page: 445
  year: 2022
  ident: 4105_CR151
  publication-title: Renew Energy
  doi: 10.1016/j.renene.2021.09.023
– ident: 4105_CR186
– ident: 4105_CR234
– ident: 4105_CR129
– ident: 4105_CR240
– ident: 4105_CR232
  doi: 10.1145/3292500.3330949
– ident: 4105_CR217
– volume: 38
  start-page: 447
  issue: 2
  year: 2000
  ident: 4105_CR24
  publication-title: SIAM J Control Optim
  doi: 10.1137/S0363012997331639
– ident: 4105_CR76
  doi: 10.1109/CVPR.2017.243
– ident: 4105_CR223
– volume: 64
  start-page: 4091
  issue: 5
  year: 2017
  ident: 4105_CR253
  publication-title: IEEE Trans Ind Electron
  doi: 10.1109/TIE.2016.2542134
– ident: 4105_CR27
– ident: 4105_CR118
– ident: 4105_CR135
– volume: 12
  start-page: 412
  issue: 2
  year: 2010
  ident: 4105_CR160
  publication-title: IEEE Trans Intell Transp Syst
– ident: 4105_CR119
  doi: 10.1109/ICHI.2017.45
– ident: 4105_CR197
– volume: 92
  start-page: 251
  issue: 1
  year: 2017
  ident: 4105_CR44
  publication-title: Wirel Pers Commun
  doi: 10.1007/s11277-016-3849-9
– ident: 4105_CR45
  doi: 10.1109/ICCV.2017.321
– ident: 4105_CR89
– volume: 27
  start-page: 430
  issue: 5
  year: 2011
  ident: 4105_CR239
  publication-title: Futur Gener Comput Syst
  doi: 10.1016/j.future.2010.10.009
– ident: 4105_CR207
  doi: 10.1016/B978-1-55860-307-3.50049-6
– ident: 4105_CR159
  doi: 10.1109/ITSC.2014.6958095
– ident: 4105_CR50
– ident: 4105_CR107
– ident: 4105_CR153
  doi: 10.1109/IROS51168.2021.9635857
SSID ssj0003301
Score 2.6573231
Snippet Deep Reinforcement Learning has made significant progress in multi-agent systems in recent years. The aim of this review article is to provide an overview of...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 13677
SubjectTerms Algorithms
Artificial Intelligence
Computer Science
Deep learning
Machine learning
Machines
Manufacturing
Mechanical Engineering
Multiagent systems
Processes
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwED5Bu7DwRhQK8sAGFkkdJ85UFdSqYqgQolK3yI8LC2oCLUP_PXbiNAIJ5jge7nwv-7v7AG7CJEkCnadU8EjSKJSMilAnNFDCMBmjKxIc2mIWT-fR04Iv_IXbysMqG59YOWpTaHdHfm9LAzdVMhZsWH5QxxrlXlc9hcYudK0LFqID3Yfx7Pll64tttV5x5tkqg8ZxuvBtM755LnJwIVuMBQ7rSDc_Q1Obb_56Iq0iz-QQ9n3KSEa1jo9gB5fHcNDQMRBvnScwHJG6EYUUOdFFUWI91ZtUoEEqXRMVMYilXVbNS9XV1SDxxBFvpzCfjF8fp9TzI1BtDWdNQ6OUcCE7MDznaHJkiY7QWmQaoki5McKIZCAZhlJEOUODViM2w5IitQdIsjPoLIslngNRJufMmqLQIo24LcESFZtYBUrnOOBa9SBsRJNpPzzccVi8Z-3YYyfOzIozq8SZbXpwu_2nrEdn_Lu630g882a0ylql9-Cu0UL7-e_dLv7f7RL2HG18DfnqQ2f9-YVXNrlYq2t_gr4BoR3KzA
  priority: 102
  providerName: ProQuest
Title A review of cooperative multi-agent deep reinforcement learning
URI https://link.springer.com/article/10.1007/s10489-022-04105-y
https://www.proquest.com/docview/2821175683
Volume 53
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5Bu7DwRhRK5YENLCV1nDgTSlEfAqlCiEplivwKC2oqWob-e2zHaQEBEpOH2B7ufL774u_uAC7DJEkCWaSY0YjjKOQEs1AmOBBMER5rCxIs22IcjybR3ZROfVLYoma710-S7qb-lOwWWXqPAU-B5Sbi1TY0qcHulsg16Wbr-9cgdNcnzyALHMfp1KfK_LzHV3e0iTG_PYs6bzPYh10fJqKs0usBbOnZIezVLRiQt8gjuMlQlXyCygLJspzrqpI3ckRBzG3iFFJaz800VyNVut-ByDeLeDmGyaD_dDvCvicClsZYljhUQjDrpgNFC6pVoUkiI22sMA01S6lSTLGky4kOOYsKopU2WjBRFWepOTScnEBjVs70KSChCkqM-THJ0oga2JWIWMUiELLQXSpFC8JaNLn0BcNt34rXfFPq2IozN-LMnTjzVQuu1mvmVbmMP2e3a4nn3nQWucGAtnxozEgLrmstbD7_vtvZ_6afw45tHV_RvtrQWL696wsTYCxFB7bZYNiBZjbo9cZ2HD7f983Y648fHjvutH0AnEXMXw
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTwIxEJ4gHvTi24ii9qAnbdyl--geDCEqgiAnSLit28d6MSwKxvCn_I1O9wHRRG6ct9uk06_zaGfmA7iwfd-3ZBxQ7joRdeyIUW5Ln1qCKxZ52gQJJtui57UGztPQHZbgu6iFMWmVhU5MFbVKpLkjv8HQwHSV9Dirj9-pYY0yr6sFhUYGi46efWHINrlt3-P-XtZqzYf-XYvmrAJUItym1FZCcGPoLOXGrlaxZr50NOI4sDUPXKW44n4tYtqOuBMzrTSuA_2SiAco9ojhvGuw7jC05KYyvfk41_yMpXTLFsY01POCYV6kk5fqOSY5CUM_y2RW0tlvQ7jwbv88yKZ2rrkDW7mDShoZonahpEd7sF2QP5BcF-xDvUGysheSxEQmyVhnPcRJmqJII1OyRZTWYxyWdmeV6UUkyWkqXg9gsBK5HUJ5lIz0ERChYpfhweeSB46LAZ8vPOUJS8hY11wpKmAXogll3qrcMGa8hYsmy0acIYozTMUZzipwNf9nnDXqWDq6Wkg8zA_tJFxArALXxS4sPv8_2_Hy2c5ho9V_7obddq9zApuGsD5LNqtCefrxqU_RrZmKsxRLBF5WDd4fwNYHYw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB5qBfHiW6zPPehJlybdPDYHEbUtrUoRsdBbzD7iRZpqK9K_5q9zNtlYFPTWczYDmf0yj92Z-QCO3TAMHZlGlPteQj03YZS7MqSO4IolgTZJgqm26AWdvncz8AcV-Cx7YUxZZWkTc0OtMmnOyOuYGpipkgFn9dSWRdw32xejV2oYpMxNa0mnUUDkVk8_MH0bn3ebuNcnjUa79XjdoZZhgEqE3oS6SghunJ6j_NTXKtUslJ5GTEeu5pGvFFc8bCRMuwn3UqaVxm_CGCXhEW5BwlDuAiyGJiuqwuJVq3f_8O0HGMvJlx3McGgQRAPbsmMb9zxTqoSJoGPqLOn0p1ucxbq_rmdzr9degxUbrpLLAl_rUNHDDVgtqSCItQybcHFJiiYYkqVEZtlIFxPFSV6wSBPTwEWU1iNcls9qlfmxJLGkFc9b0J-L5rahOsyGegeIUKnP0AxwySPPx_QvFIEKhCNkqhu-FDVwS9XE0g4uN_wZL_Fs5LJRZ4zqjHN1xtManH6_MyrGdvy7er_UeGx_4XE8A1wNzspdmD3-W9ru_9KOYAmBG991e7d7sGzY64vKs32oTt7e9QHGOBNxaMFE4Gne-P0CZwkM9Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+review+of+cooperative+multi-agent+deep+reinforcement+learning&rft.jtitle=Applied+intelligence+%28Dordrecht%2C+Netherlands%29&rft.au=Oroojlooy%2C+Afshin&rft.au=Hajinezhad%2C+Davood&rft.date=2023-06-01&rft.pub=Springer+US&rft.issn=0924-669X&rft.eissn=1573-7497&rft.volume=53&rft.issue=11&rft.spage=13677&rft.epage=13722&rft_id=info:doi/10.1007%2Fs10489-022-04105-y&rft.externalDocID=10_1007_s10489_022_04105_y
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0924-669X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0924-669X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0924-669X&client=summon