Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning proces...

Full description

Saved in:
Bibliographic Details
Published inMachine learning and knowledge extraction Vol. 3; no. 3; pp. 554 - 581
Main Authors Xiang, Xuanchen, Foo, Simon
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.09.2021
Subjects
Online AccessGet full text
ISSN2504-4990
2504-4990
DOI10.3390/make3030029

Cover

Loading…
Abstract The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.
AbstractList The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.
Author Xiang, Xuanchen
Foo, Simon
Author_xml – sequence: 1
  givenname: Xuanchen
  orcidid: 0000-0002-4897-2111
  surname: Xiang
  fullname: Xiang, Xuanchen
– sequence: 2
  givenname: Simon
  surname: Foo
  fullname: Foo, Simon
BookMark eNptUsFuEzEQXaEiUdqe-AFLXECQYq-9uzG3qKWlUkqiAGdr7J2NnG7sxd6N1BsfwRf2S_AmVCoIX8aa9-a9p9G8zI6cd5hlrxg951zSD1u4Q045pbl8lh3nBRUTISU9evJ_kZ3FuKGJUknBqDjOHlZo0PVkVu_AGYzEOnKJ2JEVWtf4YHA7wnOE4Kxbk1nXtdZAb72LJOHkq293I7CE0Fto23uy0BHDDnSL5BbCnd8lQWNjmiDL4JNHTDZvlovby-XbsZOI2_hxL0DYw89fV4OrYXSFNhJw9d-eKd91QuN7svLa99YcOF-gHwK0ZA5uPcAaH61StNPseZOk8OxPPcm-X336dvF5Ml9c31zM5hPDS9FPKlqKqtJUTLkQ5VQwXTMOAnKBea31lNdNDmVhuM6rptFF1ZRIhdQaqxIKQH6S3Rx0aw8b1QW7hXCvPFi1b_iwVuOOTItKwPgko0XDBTAmZcFQFLxBBKRlkbReH7S64H8MGHu18UNwKb7Ki6oUQtKKJhY7sEzwMQZslLH9fk99ANsqRtV4GerJZaSZd__MPCb9H_s3JwPArw
CitedBy_id crossref_primary_10_1016_j_engappai_2023_105954
crossref_primary_10_1155_2022_4560072
crossref_primary_10_1007_s11042_024_18152_9
crossref_primary_10_3390_make3040043
crossref_primary_10_1109_ACCESS_2024_3425497
crossref_primary_10_3390_app13085041
crossref_primary_10_1155_2022_3865898
crossref_primary_10_1155_2022_4878684
crossref_primary_10_3390_app12115557
crossref_primary_10_1016_j_segan_2024_101356
crossref_primary_10_1016_j_ins_2024_120980
crossref_primary_10_1016_j_ejcon_2023_100880
crossref_primary_10_3390_drones8090427
crossref_primary_10_23919_JSEE_2023_000157
crossref_primary_10_2139_ssrn_4725427
crossref_primary_10_1109_ACCESS_2023_3347500
crossref_primary_10_1016_j_comnet_2023_109874
crossref_primary_10_1016_j_engappai_2024_109688
crossref_primary_10_1109_ACCESS_2023_3297652
crossref_primary_10_1063_5_0244010
crossref_primary_10_1007_s10115_024_02167_7
crossref_primary_10_1109_TCCN_2024_3352982
crossref_primary_10_53759_aist_978_9914_9946_0_5_7
crossref_primary_10_1631_FITEE_2300548
crossref_primary_10_3390_s23031231
crossref_primary_10_3390_electricity4040020
crossref_primary_10_3390_quantum4040027
Cites_doi 10.1038/nature14236
10.1007/BF00992696
10.1016/j.robot.2018.11.004
10.21437/Interspeech.2017-1060
10.1007/978-3-319-09165-5_6
10.1162/neco.1994.6.2.215
10.1609/aiide.v15i1.5237
10.1007/978-981-10-2585-3
10.1109/ICRA.2016.7487156
10.15607/RSS.2018.XIV.049
10.24963/ijcai.2018/572
10.1109/ICRA.2017.7989385
10.1145/3209978.3210183
10.18653/v1/D16-1127
10.1109/ROBOT.2010.5509336
10.1016/j.csl.2009.07.001
10.1038/nature16961
10.1609/aaai.v32i1.11796
10.1038/s41586-019-1724-z
10.1109/GCCE.2013.6664846
10.1145/1966407.1966410
10.1073/pnas.1800923115
10.1109/ICIST.2018.8426160
10.1613/jair.3912
10.1126/science.aay2400
10.1109/ICRA.2018.8463144
10.1007/978-3-319-03194-1
10.24963/ijcai.2020/764
10.1007/978-3-540-30082-3_18
10.1126/science.1259433
10.18653/v1/D18-1421
10.1109/IROS.2011.6048825
10.1609/aiide.v14i1.13033
10.1145/2576768.2598358
10.1109/IROS.2016.7759581
10.1109/CVPR.2017.469
10.1109/ICRA.2016.7487517
10.1007/978-3-319-08864-8_25
10.1609/aaai.v31i1.10744
10.1126/science.aam6960
10.18653/v1/P17-4012
10.23919/ChiCC.2018.8483478
10.18653/v1/P18-5007
10.1109/ICRA.2018.8463162
10.1016/j.robot.2008.10.024
10.24963/ijcai.2019/339
10.1109/ICRA.2018.8463189
10.1145/1553374.1553380
10.1109/ICRA.2018.8460756
10.1109/IROS.2018.8593722
10.15607/RSS.2019.XV.011
10.1109/CVPR.2018.00443
10.1016/j.neunet.2008.02.003
10.1109/ICRA.2017.7989079
10.1613/jair.820
10.1007/978-3-319-50115-4_16
10.1109/TETCI.2018.2823329
10.1109/CCAA.2017.8229841
10.1007/BF00992698
10.18653/v1/N18-1122
10.1109/ICCV.2017.321
10.1109/CVPR.2017.128
10.1016/j.neucom.2019.08.007
10.1201/9781351006620-1
10.1609/aaai.v32i1.11581
10.1145/3134421.3137039
10.24963/ijcai.2017/385
10.1109/CVPR.2017.345
10.18653/v1/D17-1103
10.18653/v1/P17-1045
10.1109/CIG.2016.7860433
10.1609/aaai.v31i1.10918
10.1109/ICRA40945.2020.9196659
10.18653/v1/D17-1062
10.1016/j.neucom.2007.11.026
10.1109/CIG.2018.8490423
10.1145/2659003
10.18653/v1/P18-1083
10.1609/aaai.v31i1.10827
10.1109/ICCV.2017.100
10.1109/IRC.2019.00120
10.1126/science.aar6404
10.1109/TG.2017.2785042
10.1007/978-3-540-75538-8_11
10.1126/science.1144079
10.1038/nature24270
10.1007/s10994-005-0461-8
10.18653/v1/E17-2077
10.1109/INISTA.2019.8778209
10.1109/IROS.2012.6386109
10.1145/3072959.3073602
10.1109/IROS.2018.8593410
10.1609/aaai.v31i1.10804
10.1007/BF00115009
10.1609/aaai.v30i1.10295
10.1109/ICRA.2019.8794102
10.1109/ICRA.2014.6907864
10.15607/RSS.2018.XIV.010
10.18653/v1/D18-1397
10.1162/neco.2007.19.11.3051
10.23919/ChiCC.2018.8482790
ContentType Journal Article
Copyright 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOA
DOI 10.3390/make3030029
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList CrossRef

Publicly Available Content Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2504-4990
EndPage 581
ExternalDocumentID oai_doaj_org_article_4aaaaa9105f34a119951e453feeae065
10_3390_make3030029
GroupedDBID AADQD
AAFWJ
AAYXX
AFKRA
AFPKN
AFZYC
ALMA_UNASSIGNED_HOLDINGS
ARAPS
BENPR
BGLVJ
CCPQU
CITATION
GROUPED_DOAJ
HCIFZ
K7-
MODMG
M~E
OK1
PHGZM
PHGZT
PIMPY
8FE
8FG
ABUWG
AZQEC
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PUEGO
ID FETCH-LOGICAL-c364t-706477b0483446841bd13a4a24e2dbb83df2a65c3b27ffb57f6e049bbe76a5ae3
IEDL.DBID BENPR
ISSN 2504-4990
IngestDate Wed Aug 27 01:29:27 EDT 2025
Mon Jul 14 06:58:41 EDT 2025
Thu Apr 24 23:01:54 EDT 2025
Tue Jul 01 03:11:07 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c364t-706477b0483446841bd13a4a24e2dbb83df2a65c3b27ffb57f6e049bbe76a5ae3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4897-2111
OpenAccessLink https://www.proquest.com/docview/2576449070?pq-origsite=%requestingapplication%
PQID 2576449070
PQPubID 5046881
PageCount 28
ParticipantIDs doaj_primary_oai_doaj_org_article_4aaaaa9105f34a119951e453feeae065
proquest_journals_2576449070
crossref_citationtrail_10_3390_make3030029
crossref_primary_10_3390_make3030029
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-09-01
PublicationDateYYYYMMDD 2021-09-01
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-09-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Machine learning and knowledge extraction
PublicationYear 2021
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References ref_94
ref_137
ref_93
ref_136
ref_92
ref_139
ref_91
ref_138
ref_90
Iida (ref_31) 2014; Volume 8427
ref_131
ref_99
ref_130
ref_98
ref_133
Vinyals (ref_97) 2019; 575
ref_132
ref_96
ref_95
ref_134
ref_126
ref_125
ref_128
ref_127
ref_129
ref_120
ref_122
Mnih (ref_7) 2015; 518
ref_121
ref_124
ref_123
Bowling (ref_46) 2015; 347
(ref_9) 2010; 23
ref_72
ref_159
ref_71
ref_158
ref_70
Argall (ref_111) 2009; 57
ref_151
ref_79
ref_78
ref_153
ref_77
ref_152
ref_76
ref_155
ref_75
ref_74
ref_157
ref_73
ref_156
Dethlefs (ref_174) 2014; 4
ref_160
ref_83
ref_148
Ginsberg (ref_30) 2001; 14
ref_82
ref_147
ref_81
ref_80
ref_149
Ishii (ref_50) 2005; 59
ref_140
ref_89
ref_142
ref_88
ref_141
Renals (ref_172) 2010; 24
ref_87
ref_144
ref_86
ref_143
ref_85
ref_146
ref_84
ref_145
Williams (ref_181) 1992; 8
Fujita (ref_51) 2007; 19
Tesauro (ref_27) 1994; 6
ref_201
ref_200
ref_115
ref_114
Silver (ref_37) 2016; 529
ref_116
ref_119
Lee (ref_178) 2019; 366
ref_110
ref_113
Gao (ref_32) 2018; 10
Dethlefs (ref_173) 2011; 7
Silver (ref_39) 2018; 362
ref_104
ref_103
ref_106
ref_105
ref_108
ref_107
ref_109
Silver (ref_38) 2017; 550
ref_100
ref_101
Buro (ref_28) 1999; 14
Bellemare (ref_60) 2013; 47
ref_14
ref_13
Sutton (ref_15) 1988; 3
ref_12
ref_11
ref_10
Tsurumine (ref_135) 2019; 112
ref_19
ref_18
ref_17
Brown (ref_49) 2019; 365
ref_16
Verma (ref_154) 2018; 115
ref_25
ref_24
ref_23
ref_22
ref_21
ref_20
ref_26
Deisenroth (ref_112) 2013; 2
Schmid (ref_48) 2017; 356
Peng (ref_150) 2016; 35
Akimov (ref_67) 2019; 2479
Schaeffer (ref_29) 2007; 317
ref_58
ref_57
ref_56
ref_175
ref_55
ref_54
ref_177
ref_53
ref_176
ref_52
ref_179
ref_180
ref_182
ref_59
ref_61
ref_169
ref_69
ref_162
ref_68
ref_161
ref_164
ref_66
ref_163
ref_65
ref_166
ref_64
ref_165
ref_63
ref_168
ref_62
ref_167
Kimura (ref_40) 2020; 9
ref_171
ref_170
Block (ref_34) 2008; 35
ref_36
Peters (ref_118) 2008; 71
ref_195
ref_35
ref_194
Shao (ref_102) 2019; 3
ref_197
ref_33
ref_196
ref_199
ref_198
Watkins (ref_3) 1992; 8
Peters (ref_117) 2008; 21
ref_47
ref_184
ref_183
ref_45
ref_186
ref_44
ref_185
ref_43
ref_188
ref_42
ref_187
ref_41
ref_189
ref_1
ref_2
ref_191
ref_190
ref_193
ref_192
ref_8
ref_5
ref_4
ref_6
References_xml – volume: 518
  start-page: 529
  year: 2015
  ident: ref_7
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– volume: 8
  start-page: 229
  year: 1992
  ident: ref_181
  article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning
  publication-title: Mach. Learn.
  doi: 10.1007/BF00992696
– ident: ref_108
– ident: ref_132
– volume: 112
  start-page: 72
  year: 2019
  ident: ref_135
  article-title: Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation
  publication-title: Robot. Auton. Syst.
  doi: 10.1016/j.robot.2018.11.004
– ident: ref_42
– ident: ref_177
  doi: 10.21437/Interspeech.2017-1060
– volume: Volume 8427
  start-page: 60
  year: 2014
  ident: ref_31
  article-title: MoHex 2.0: A Pattern-Based MCTS Hex Player
  publication-title: Computers and Games
  doi: 10.1007/978-3-319-09165-5_6
– volume: 6
  start-page: 215
  year: 1994
  ident: ref_27
  article-title: TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
  publication-title: Neural Comput.
  doi: 10.1162/neco.1994.6.2.215
– ident: ref_87
  doi: 10.1609/aiide.v15i1.5237
– ident: ref_94
– ident: ref_77
– ident: ref_176
  doi: 10.1007/978-981-10-2585-3
– ident: ref_166
– ident: ref_124
  doi: 10.1109/ICRA.2016.7487156
– ident: ref_125
  doi: 10.15607/RSS.2018.XIV.049
– ident: ref_114
– ident: ref_4
– ident: ref_190
  doi: 10.24963/ijcai.2018/572
– ident: ref_120
– ident: ref_134
  doi: 10.1109/ICRA.2017.7989385
– ident: ref_163
  doi: 10.1145/3209978.3210183
– ident: ref_83
– ident: ref_13
– ident: ref_103
– ident: ref_59
– ident: ref_186
  doi: 10.18653/v1/D16-1127
– ident: ref_119
  doi: 10.1109/ROBOT.2010.5509336
– ident: ref_53
– volume: 24
  start-page: 395
  year: 2010
  ident: ref_172
  article-title: Evaluation of a hierarchical reinforcement learning spoken dialogue system
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2009.07.001
– ident: ref_47
– volume: 529
  start-page: 484
  year: 2016
  ident: ref_37
  article-title: Mastering the game of Go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
– ident: ref_14
  doi: 10.1609/aaai.v32i1.11796
– volume: 575
  start-page: 350
  year: 2019
  ident: ref_97
  article-title: Grandmaster level in StarCraft II using multi-agent reinforcement learning
  publication-title: Nature
  doi: 10.1038/s41586-019-1724-z
– ident: ref_25
– volume: 35
  start-page: 31
  year: 2008
  ident: ref_34
  article-title: Using Reinforcement Learning in Chess Engines
  publication-title: Res. Comput. Sci.
– ident: ref_54
  doi: 10.1109/GCCE.2013.6664846
– volume: 7
  start-page: 1
  year: 2011
  ident: ref_173
  article-title: Spatially-aware dialogue control using hierarchical reinforcement learning
  publication-title: Acm Trans. Speech Lang. Process. (TSLP)
  doi: 10.1145/1966407.1966410
– ident: ref_179
– volume: 115
  start-page: 5849
  year: 2018
  ident: ref_154
  article-title: Efficient collective swimming by harnessing vortices through deep reinforcement learning
  publication-title: Proc. Natl. Acad. Sci. USA
  doi: 10.1073/pnas.1800923115
– volume: 2
  start-page: 1
  year: 2013
  ident: ref_112
  article-title: A Survey on Policy Search for Robotics
  publication-title: Found. Trends Robot.
– ident: ref_36
– ident: ref_70
– ident: ref_19
– ident: ref_95
– ident: ref_105
  doi: 10.1109/ICIST.2018.8426160
– volume: 47
  start-page: 253
  year: 2013
  ident: ref_60
  article-title: The Arcade Learning Environment: An Evaluation Platform for General Agents
  publication-title: J. Artif. Intell. Res.
  doi: 10.1613/jair.3912
– volume: 365
  start-page: 885
  year: 2019
  ident: ref_49
  article-title: Superhuman AI for multiplayer poker
  publication-title: Science
  doi: 10.1126/science.aay2400
– ident: ref_55
– ident: ref_156
  doi: 10.1109/ICRA.2018.8463144
– ident: ref_26
– ident: ref_113
  doi: 10.1007/978-3-319-03194-1
– volume: 35
  start-page: 1
  year: 2016
  ident: ref_150
  article-title: Terrain-adaptive locomotion skills using deep reinforcement learning
  publication-title: ACM Trans. Graph. (TOG)
– ident: ref_151
– ident: ref_61
– ident: ref_104
– ident: ref_98
– ident: ref_58
  doi: 10.24963/ijcai.2020/764
– ident: ref_139
– ident: ref_158
  doi: 10.1007/978-3-540-30082-3_18
– volume: 347
  start-page: 145
  year: 2015
  ident: ref_46
  article-title: Heads-up limit hold’em poker is solved
  publication-title: Science
  doi: 10.1126/science.1259433
– ident: ref_66
– volume: 23
  start-page: 2613
  year: 2010
  ident: ref_9
  article-title: Double Q-Learning
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref_184
  doi: 10.18653/v1/D18-1421
– ident: ref_110
– ident: ref_121
  doi: 10.1109/IROS.2011.6048825
– ident: ref_20
– ident: ref_93
– ident: ref_109
  doi: 10.1609/aiide.v14i1.13033
– ident: ref_162
– ident: ref_73
  doi: 10.1145/2576768.2598358
– ident: ref_137
  doi: 10.1109/IROS.2016.7759581
– ident: ref_144
– ident: ref_201
  doi: 10.1109/CVPR.2017.469
– ident: ref_130
  doi: 10.1109/ICRA.2016.7487517
– ident: ref_82
– ident: ref_21
– ident: ref_72
  doi: 10.1007/978-3-319-08864-8_25
– ident: ref_89
  doi: 10.1609/aaai.v31i1.10744
– volume: 356
  start-page: 508
  year: 2017
  ident: ref_48
  article-title: DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
  publication-title: Science
  doi: 10.1126/science.aam6960
– ident: ref_167
  doi: 10.18653/v1/P17-4012
– ident: ref_149
  doi: 10.23919/ChiCC.2018.8483478
– ident: ref_159
  doi: 10.18653/v1/P18-5007
– ident: ref_122
  doi: 10.1109/ICRA.2018.8463162
– ident: ref_2
– volume: 2479
  start-page: 3
  year: 2019
  ident: ref_67
  article-title: Deep Reinforcement Learning with VizDoom First-Person Shooter
  publication-title: CEUR Workshop Proc.
– volume: 57
  start-page: 469
  year: 2009
  ident: ref_111
  article-title: A survey of robot learning from demonstration
  publication-title: Robot. Auton. Syst.
  doi: 10.1016/j.robot.2008.10.024
– ident: ref_84
  doi: 10.24963/ijcai.2019/339
– ident: ref_195
– ident: ref_145
  doi: 10.1109/ICRA.2018.8463189
– ident: ref_69
  doi: 10.1145/1553374.1553380
– ident: ref_133
– ident: ref_43
– ident: ref_126
  doi: 10.1109/ICRA.2018.8460756
– ident: ref_147
  doi: 10.1109/IROS.2018.8593722
– ident: ref_140
  doi: 10.15607/RSS.2019.XV.011
– ident: ref_199
  doi: 10.1109/CVPR.2018.00443
– volume: 21
  start-page: 682
  year: 2008
  ident: ref_117
  article-title: Reinforcement learning of motor skills with policy gradients
  publication-title: Neural Netw.
  doi: 10.1016/j.neunet.2008.02.003
– ident: ref_99
– ident: ref_74
– ident: ref_80
– ident: ref_100
– ident: ref_68
– ident: ref_175
– ident: ref_155
  doi: 10.1109/ICRA.2017.7989079
– ident: ref_16
– volume: 14
  start-page: 303
  year: 2001
  ident: ref_30
  article-title: GIB: Imperfect Information in a Computationally Challenging Game
  publication-title: J. Artif. Intell. Res.
  doi: 10.1613/jair.820
– ident: ref_131
  doi: 10.1007/978-3-319-50115-4_16
– ident: ref_1
– ident: ref_146
– volume: 3
  start-page: 73
  year: 2019
  ident: ref_102
  article-title: StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning
  publication-title: IEEE Trans. Emerg. Top. Comput. Intell.
  doi: 10.1109/TETCI.2018.2823329
– ident: ref_160
  doi: 10.1109/CCAA.2017.8229841
– ident: ref_141
– volume: 8
  start-page: 279
  year: 1992
  ident: ref_3
  article-title: Q Learning
  publication-title: Mach. Learn.
  doi: 10.1007/BF00992698
– ident: ref_56
– ident: ref_169
  doi: 10.18653/v1/N18-1122
– ident: ref_192
  doi: 10.1109/ICCV.2017.321
– ident: ref_10
– ident: ref_197
  doi: 10.1109/CVPR.2017.128
– ident: ref_152
– volume: 366
  start-page: 118
  year: 2019
  ident: ref_178
  article-title: Ensemble-based deep reinforcement learning for chatbots
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2019.08.007
– ident: ref_128
– ident: ref_62
– ident: ref_5
  doi: 10.1201/9781351006620-1
– ident: ref_45
– ident: ref_88
  doi: 10.1609/aaai.v32i1.11581
– ident: ref_161
  doi: 10.1145/3134421.3137039
– ident: ref_193
  doi: 10.24963/ijcai.2017/385
– ident: ref_196
  doi: 10.1109/CVPR.2017.345
– ident: ref_187
– ident: ref_200
  doi: 10.18653/v1/D17-1103
– ident: ref_188
  doi: 10.18653/v1/P17-1045
– ident: ref_11
– ident: ref_86
– ident: ref_64
  doi: 10.1109/CIG.2016.7860433
– ident: ref_157
– ident: ref_92
– ident: ref_129
– ident: ref_106
– ident: ref_44
– ident: ref_182
– volume: 14
  start-page: 12
  year: 1999
  ident: ref_28
  article-title: How Machines have Learned to Play Othello
  publication-title: IEEE Intell. Syst.
– ident: ref_6
– ident: ref_78
  doi: 10.1609/aaai.v31i1.10918
– ident: ref_75
– ident: ref_81
– ident: ref_33
– ident: ref_136
  doi: 10.1109/ICRA40945.2020.9196659
– ident: ref_185
  doi: 10.18653/v1/D17-1062
– volume: 9
  start-page: 69
  year: 2020
  ident: ref_40
  article-title: Development of AlphaZero-based Reinforcment Learning Algorithm for Solving Partially Observable Markov Decision Process (POMDP) Problem
  publication-title: Bull. Netw. Comput. Syst. Softw.
– volume: 71
  start-page: 1180
  year: 2008
  ident: ref_118
  article-title: Natural actor-critic
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2007.11.026
– ident: ref_71
  doi: 10.1109/CIG.2018.8490423
– volume: 4
  start-page: 1
  year: 2014
  ident: ref_174
  article-title: Nonstrict hierarchical reinforcement learning for interactive systems and robots
  publication-title: Acm Trans. Interact. Intell. Syst. (TiiS)
  doi: 10.1145/2659003
– ident: ref_101
– ident: ref_22
– ident: ref_198
  doi: 10.18653/v1/P18-1083
– ident: ref_168
– ident: ref_142
– ident: ref_165
– ident: ref_171
– ident: ref_90
– ident: ref_180
– ident: ref_65
  doi: 10.1609/aaai.v31i1.10827
– ident: ref_194
  doi: 10.1109/ICCV.2017.100
– ident: ref_116
  doi: 10.1109/IRC.2019.00120
– ident: ref_35
– ident: ref_23
– volume: 362
  start-page: 1140
  year: 2018
  ident: ref_39
  article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
  publication-title: Science
  doi: 10.1126/science.aar6404
– volume: 10
  start-page: 336
  year: 2018
  ident: ref_32
  article-title: Move Prediction Using Deep Convolutional Neural Networks in Hex
  publication-title: IEEE Trans. Games
  doi: 10.1109/TG.2017.2785042
– ident: ref_52
  doi: 10.1007/978-3-540-75538-8_11
– volume: 317
  start-page: 1518
  year: 2007
  ident: ref_29
  article-title: Checkers Is Solved
  publication-title: Science
  doi: 10.1126/science.1144079
– volume: 550
  start-page: 354
  year: 2017
  ident: ref_38
  article-title: Mastering the game of Go without human knowledge
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: ref_41
– volume: 59
  start-page: 31
  year: 2005
  ident: ref_50
  article-title: A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game
  publication-title: Mach. Learn.
  doi: 10.1007/s10994-005-0461-8
– ident: ref_189
  doi: 10.18653/v1/E17-2077
– ident: ref_107
– ident: ref_17
– ident: ref_138
  doi: 10.1109/INISTA.2019.8778209
– ident: ref_115
  doi: 10.1109/IROS.2012.6386109
– ident: ref_143
  doi: 10.1145/3072959.3073602
– ident: ref_24
– ident: ref_191
  doi: 10.1109/IROS.2018.8593410
– ident: ref_183
  doi: 10.1609/aaai.v31i1.10804
– volume: 3
  start-page: 9
  year: 1988
  ident: ref_15
  article-title: Learning to Predict by the Methods of Temporal Differences
  publication-title: Mach. Learn.
  doi: 10.1007/BF00115009
– ident: ref_8
  doi: 10.1609/aaai.v30i1.10295
– ident: ref_153
– ident: ref_63
– ident: ref_18
– ident: ref_96
– ident: ref_127
  doi: 10.1109/ICRA.2019.8794102
– ident: ref_79
– ident: ref_123
  doi: 10.1109/ICRA.2014.6907864
– ident: ref_148
  doi: 10.15607/RSS.2018.XIV.010
– ident: ref_164
  doi: 10.18653/v1/D18-1397
– volume: 19
  start-page: 3051
  year: 2007
  ident: ref_51
  article-title: Model-based Reinforcement Learning for Partially Observable Games with Sampling-based State Estimation
  publication-title: Neural Comput.
  doi: 10.1162/neco.2007.19.11.3051
– ident: ref_76
  doi: 10.23919/ChiCC.2018.8482790
– ident: ref_12
– ident: ref_85
– ident: ref_170
– ident: ref_91
– ident: ref_57
SSID ssj0002794104
Score 2.4057546
SecondaryResourceType review_article
Snippet The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially...
SourceID doaj
proquest
crossref
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
StartPage 554
SubjectTerms Algorithms
Deep learning
deep reinforcement learning
Dynamic programming
Expected utility
Games
Machine learning
Markov analysis
Markov decision process
Markov processes
Natural language processing
partially observable Markov decision process
Probability
Probability distribution
reinforcement learning
Robotics
Transportation applications
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NbtRADB6hnnpBoBaxUJAPPQAi6s5fsuG2tCwVou2qUKm3aCbjQUBIqu6CxI2H6BP2SbAn2WorkLiQ48SKJ7Fjj8fjz0Ls2pJ8Oi0csjFalZkgXVbGILN6ksdSehVN4Nrho-P88My8O7fna62--ExYDw_cf7g94_gip2ajNk5yRbFEY3VEdEj-k60v-by1YOpLSqeVhgKNviBPU1y_9819RTLXnIW65YISUv8fhjh5l9k9cXdYFsK0n859cQfbLXFNazryCTDt8_QL-NzCAeIFnGLCO63T1h4MEKmfYLqWjAa6Dx-6hvcLYM6v6ZrmJ5z4tAvrGwSu0ul-0AP7JjswlAwQm2fzk6OD-XMe4WYzi1fpASCvf13NuHCk7wewANeG2zxpfm_51O1LOO18xwDQiebYJXAPeD_sja5Y0dS2xdnszcf9w2zoyZDVOjfLrODi1MIzED0FkhMjfZDaGacMquD9RIeoXG5r7VURo7dFzJGCEO-xyJ11qB-IjbZr8aGAMZp6UqgYChOMROU5kzxGOa7rIrfajsSLlZiqegAs574ZTUWBC8u0WpPpSOzeEF_0OB1_J3vN8r4hYXDtNEAqVw0qV_1L5UZiZ6Ut1fDHLyoO3IwpyYI--h88HotNxadn0mm2HbGxvPyOT2j5s_RPk6b_BvXNCHk
  priority: 102
  providerName: Directory of Open Access Journals
Title Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing
URI https://www.proquest.com/docview/2576449070
https://doaj.org/article/4aaaaa9105f34a119951e453feeae065
Volume 3
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Pb9MwFLdYd-GCQIAojMqHHQARLU7sJOWCOtZuQrSrCpN2i-z4eUKEpiwFiQviQ_AJ90l4z3HLJhA55GBbtqX3_Pz--P0eY_tqiHc6Kg5RDCqJpBU6GjoroqrI3FCYxElLucPTWXZyJt-eq_PgcGvDs8qNTPSC2jYV-cgPSDGWEk25-PXqS0RVoyi6Gkpo7LBdFMGF6rHdw_Fsvth6WRJkNzQ4usS8FO37g8_6E6DYpmjUjavII_b_JZD9LTO5y-4E9ZCPOnreY7dgeZ9doW6HdwMfdfH6ln9c8iOAFV-Axz2tvIuPB6jUCz66FpTm2M_fNzX5Dfic-ETX9Xd-arw31tTAKVun-YYTdsV2eEgdwGWezU-nR_Pn1EJFZ9pXfgIurn7-mlACSVcXoOV6aW-uifs7pte3L_miMQ0BQfsxM-1BPvi74CPdLIVbe8DOJuMPb06iUJshqtJMrqOcklRzQ4D0aFAWUhgrUi11IiGxxhSpdYnOVJWaJHfOqNxlgMaIMZBnWmlIH7LeslnCI8ZjkFWRJ87m0koBiaGIcgwirqo8U6nqsxcbMpVVAC6n-hl1iQYM0bS8RtM-298OXnV4Hf8edkj03g4hkG3f0FxelOHMllLTh_qUcqnUgpLZBUiVOgANqLr12d6GW8pw8tvyD58-_n_3E3Y7ofcx_r3aHuutL7_CU1Rw1mbAdorJ8SDw8sC7CfA__TH-DWqRBBQ
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Pb9MwFLfGdoALAgGiMMCHIQEiWuI4SYOEUEdXOtZ2Vdmk3TI7fpnQsqQsBbQbH4LPwYfaJ-E9JymbQNyWo2PZlt6z3-_9Z2wjiFGmI3BwXAiEI42nnDgznpN2wyz2tMikodzh8SQcHsiPh8HhCvvV5sJQWGX7JtqH2pQp2cg3CRhLiaqc-27-xaGuUeRdbVto1GyxC-ffUWWr3u70kb7PhRhs778fOk1XASf1Q7lwIkqvjDSVUkdVqCs9bTxfSSUkCKN11zeZUGGQ-lpEWaaDKAsBYbTWEIUqUODjujfYGsKMGG_R2tb2ZDpbWnUEsjcqOHUioO_H7uapOgEUE-T9uiL6bIeAvwSAlWqDO-x2A0d5r-afu2wFinvsArEkyiLeq-MDKv654H2AOZ-BrbOaWpMib0qzHvPeJSc4x__8U5mTnYJPiS9Vnp_zPW2tvzoHTtlB5TdcsG7uw5tUBdzmxXRv3J--pBFqclO9sQtw7-LHzwElrNR9CCquCnN1TzzfB4r2fc1npS6p8LSdM1G2qAgfNTbZdis82n12cC1Ue8BWi7KAh4y7INNuJDITSSM9EJo82C54bppGYeAHHfaqJVOSNoXSqV9HnqDCRDRNLtG0wzaWk-d1fZB_T9siei-nUFFvO1CeHSfNG5FIRR_ityDzpfIoed4DGfgZgAKEih223nJL0rw0VfLnXjz6_-9n7OZwfzxKRjuT3cfslqDYHBsrt85WF2df4QmCq4V-2nA0Z0fXfYl-A56_PhE
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtQw0CpbCXFBIEAsFPChSICINnGcxyIhtGW7tLTdRguVegt2Mq6qhmRpFlBvfARfw-f0S5hxkqUViFtzdEa2pRnP-8HYejBEmY6Kg-NCIByZe8oZmtxzsjg0Q08LI3OqHd6bhlsH8v1hcLjCfnW1MJRW2fFEy6jzKiMf-YAUYynRlHMHpk2LSMaTN_MvDk2QokhrN06jIZEdOPuO5lv9enuMuH4qxGTz49stp50w4GR-KBdORKWWkaa26mgWxdLTuecrqYQEkWsd-7kRKgwyX4vIGB1EJgRUqbWGKFSBAh_3vcZWI5SKcY-tbmxOk9nSwyOQ1NHYaYoCfX_oDj6rE0CRQZGwS2LQTgv4SxhYCTe5xW62qikfNbR0m61AeYedo16JcomPmlyBmh-XfAww5zOwPVcz617kbZvWIz66EBDn-J9_qAryWfCEaFQVxRnf19YTrAvgVClUfcMNm0E_vC1bwGOeJft74-Q5rdDAm_qV3YB75z9-Tqh4pZlJUHNV5pfPxPu9o8zfl3xW6YqaUFuYqbINRvhu65_tjsKr3WUHV4K1e6xXViXcZ9wFmcWRMHkkc-mB0BTNdsFzsywKAz_osxcdmtKsbZpOszuKFI0nwml6Aad9tr4Enje9Qv4NtkH4XoJQg2-7UJ0epS2_SKWiD3W5wPhSeVRI74EMfAOgANXGPlvrqCVtuU6d_nkjD_7_-wm7jo8n3d2e7jxkNwSl6di0uTXWW5x-hUeoZy3045agOft01W_oN97yQj0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recent+Advances+in+Deep+Reinforcement+Learning+Applications+for+Solving+Partially+Observable+Markov+Decision+Processes+%28POMDP%29+Problems%3A+Part+1%E2%80%94Fundamentals+and+Applications+in+Games%2C+Robotics+and+Natural+Language+Processing&rft.jtitle=Machine+learning+and+knowledge+extraction&rft.au=Xiang%2C+Xuanchen&rft.au=Foo%2C+Simon&rft.date=2021-09-01&rft.issn=2504-4990&rft.eissn=2504-4990&rft.volume=3&rft.issue=3&rft.spage=554&rft.epage=581&rft_id=info:doi/10.3390%2Fmake3030029&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_make3030029
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-4990&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-4990&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-4990&client=summon