Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning proces...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning and knowledge extraction Vol. 3; no. 3; pp. 554 - 581
Main Authors	Xiang, Xuanchen, Foo, Simon
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.09.2021
Subjects	Algorithms Deep learning deep reinforcement learning Dynamic programming Expected utility Games Machine learning Markov analysis Markov decision process Markov processes Natural language processing partially observable Markov decision process Probability Probability distribution reinforcement learning Robotics Transportation applications
Online Access	Get full text
ISSN	2504-4990 2504-4990
DOI	10.3390/make3030029

Cover

Loading…

Abstract	The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.
AbstractList	The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.
Author	Xiang, Xuanchen Foo, Simon
Author_xml	– sequence: 1 givenname: Xuanchen orcidid: 0000-0002-4897-2111 surname: Xiang fullname: Xiang, Xuanchen – sequence: 2 givenname: Simon surname: Foo fullname: Foo, Simon
BookMark	eNptUsFuEzEQXaEiUdqe-AFLXECQYq-9uzG3qKWlUkqiAGdr7J2NnG7sxd6N1BsfwRf2S_AmVCoIX8aa9-a9p9G8zI6cd5hlrxg951zSD1u4Q045pbl8lh3nBRUTISU9evJ_kZ3FuKGJUknBqDjOHlZo0PVkVu_AGYzEOnKJ2JEVWtf4YHA7wnOE4Kxbk1nXtdZAb72LJOHkq293I7CE0Fto23uy0BHDDnSL5BbCnd8lQWNjmiDL4JNHTDZvlovby-XbsZOI2_hxL0DYw89fV4OrYXSFNhJw9d-eKd91QuN7svLa99YcOF-gHwK0ZA5uPcAaH61StNPseZOk8OxPPcm-X336dvF5Ml9c31zM5hPDS9FPKlqKqtJUTLkQ5VQwXTMOAnKBea31lNdNDmVhuM6rptFF1ZRIhdQaqxIKQH6S3Rx0aw8b1QW7hXCvPFi1b_iwVuOOTItKwPgko0XDBTAmZcFQFLxBBKRlkbReH7S64H8MGHu18UNwKb7Ki6oUQtKKJhY7sEzwMQZslLH9fk99ANsqRtV4GerJZaSZd__MPCb9H_s3JwPArw
CitedBy_id	crossref_primary_10_1016_j_engappai_2023_105954 crossref_primary_10_1155_2022_4560072 crossref_primary_10_1007_s11042_024_18152_9 crossref_primary_10_3390_make3040043 crossref_primary_10_1109_ACCESS_2024_3425497 crossref_primary_10_3390_app13085041 crossref_primary_10_1155_2022_3865898 crossref_primary_10_1155_2022_4878684 crossref_primary_10_3390_app12115557 crossref_primary_10_1016_j_segan_2024_101356 crossref_primary_10_1016_j_ins_2024_120980 crossref_primary_10_1016_j_ejcon_2023_100880 crossref_primary_10_3390_drones8090427 crossref_primary_10_23919_JSEE_2023_000157 crossref_primary_10_2139_ssrn_4725427 crossref_primary_10_1109_ACCESS_2023_3347500 crossref_primary_10_1016_j_comnet_2023_109874 crossref_primary_10_1016_j_engappai_2024_109688 crossref_primary_10_1109_ACCESS_2023_3297652 crossref_primary_10_1063_5_0244010 crossref_primary_10_1007_s10115_024_02167_7 crossref_primary_10_1109_TCCN_2024_3352982 crossref_primary_10_53759_aist_978_9914_9946_0_5_7 crossref_primary_10_1631_FITEE_2300548 crossref_primary_10_3390_s23031231 crossref_primary_10_3390_electricity4040020 crossref_primary_10_3390_quantum4040027
Cites_doi	10.1038/nature14236 10.1007/BF00992696 10.1016/j.robot.2018.11.004 10.21437/Interspeech.2017-1060 10.1007/978-3-319-09165-5_6 10.1162/neco.1994.6.2.215 10.1609/aiide.v15i1.5237 10.1007/978-981-10-2585-3 10.1109/ICRA.2016.7487156 10.15607/RSS.2018.XIV.049 10.24963/ijcai.2018/572 10.1109/ICRA.2017.7989385 10.1145/3209978.3210183 10.18653/v1/D16-1127 10.1109/ROBOT.2010.5509336 10.1016/j.csl.2009.07.001 10.1038/nature16961 10.1609/aaai.v32i1.11796 10.1038/s41586-019-1724-z 10.1109/GCCE.2013.6664846 10.1145/1966407.1966410 10.1073/pnas.1800923115 10.1109/ICIST.2018.8426160 10.1613/jair.3912 10.1126/science.aay2400 10.1109/ICRA.2018.8463144 10.1007/978-3-319-03194-1 10.24963/ijcai.2020/764 10.1007/978-3-540-30082-3_18 10.1126/science.1259433 10.18653/v1/D18-1421 10.1109/IROS.2011.6048825 10.1609/aiide.v14i1.13033 10.1145/2576768.2598358 10.1109/IROS.2016.7759581 10.1109/CVPR.2017.469 10.1109/ICRA.2016.7487517 10.1007/978-3-319-08864-8_25 10.1609/aaai.v31i1.10744 10.1126/science.aam6960 10.18653/v1/P17-4012 10.23919/ChiCC.2018.8483478 10.18653/v1/P18-5007 10.1109/ICRA.2018.8463162 10.1016/j.robot.2008.10.024 10.24963/ijcai.2019/339 10.1109/ICRA.2018.8463189 10.1145/1553374.1553380 10.1109/ICRA.2018.8460756 10.1109/IROS.2018.8593722 10.15607/RSS.2019.XV.011 10.1109/CVPR.2018.00443 10.1016/j.neunet.2008.02.003 10.1109/ICRA.2017.7989079 10.1613/jair.820 10.1007/978-3-319-50115-4_16 10.1109/TETCI.2018.2823329 10.1109/CCAA.2017.8229841 10.1007/BF00992698 10.18653/v1/N18-1122 10.1109/ICCV.2017.321 10.1109/CVPR.2017.128 10.1016/j.neucom.2019.08.007 10.1201/9781351006620-1 10.1609/aaai.v32i1.11581 10.1145/3134421.3137039 10.24963/ijcai.2017/385 10.1109/CVPR.2017.345 10.18653/v1/D17-1103 10.18653/v1/P17-1045 10.1109/CIG.2016.7860433 10.1609/aaai.v31i1.10918 10.1109/ICRA40945.2020.9196659 10.18653/v1/D17-1062 10.1016/j.neucom.2007.11.026 10.1109/CIG.2018.8490423 10.1145/2659003 10.18653/v1/P18-1083 10.1609/aaai.v31i1.10827 10.1109/ICCV.2017.100 10.1109/IRC.2019.00120 10.1126/science.aar6404 10.1109/TG.2017.2785042 10.1007/978-3-540-75538-8_11 10.1126/science.1144079 10.1038/nature24270 10.1007/s10994-005-0461-8 10.18653/v1/E17-2077 10.1109/INISTA.2019.8778209 10.1109/IROS.2012.6386109 10.1145/3072959.3073602 10.1109/IROS.2018.8593410 10.1609/aaai.v31i1.10804 10.1007/BF00115009 10.1609/aaai.v30i1.10295 10.1109/ICRA.2019.8794102 10.1109/ICRA.2014.6907864 10.15607/RSS.2018.XIV.010 10.18653/v1/D18-1397 10.1162/neco.2007.19.11.3051 10.23919/ChiCC.2018.8482790
ContentType	Journal Article
Copyright	2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	AAYXX CITATION 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS DOA
DOI	10.3390/make3030029
DatabaseName	CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	CrossRef Publicly Available Content Database
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
EISSN	2504-4990
EndPage	581
ExternalDocumentID	oai_doaj_org_article_4aaaaa9105f34a119951e453feeae065 10_3390_make3030029
GroupedDBID	AADQD AAFWJ AAYXX AFKRA AFPKN AFZYC ALMA_UNASSIGNED_HOLDINGS ARAPS BENPR BGLVJ CCPQU CITATION GROUPED_DOAJ HCIFZ K7- MODMG M~E OK1 PHGZM PHGZT PIMPY 8FE 8FG ABUWG AZQEC DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PUEGO
ID	FETCH-LOGICAL-c364t-706477b0483446841bd13a4a24e2dbb83df2a65c3b27ffb57f6e049bbe76a5ae3
IEDL.DBID	BENPR
ISSN	2504-4990
IngestDate	Wed Aug 27 01:29:27 EDT 2025 Mon Jul 14 06:58:41 EDT 2025 Thu Apr 24 23:01:54 EDT 2025 Tue Jul 01 03:11:07 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c364t-706477b0483446841bd13a4a24e2dbb83df2a65c3b27ffb57f6e049bbe76a5ae3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-4897-2111
OpenAccessLink	https://www.proquest.com/docview/2576449070?pq-origsite=%requestingapplication%
PQID	2576449070
PQPubID	5046881
PageCount	28
ParticipantIDs	doaj_primary_oai_doaj_org_article_4aaaaa9105f34a119951e453feeae065 proquest_journals_2576449070 crossref_citationtrail_10_3390_make3030029 crossref_primary_10_3390_make3030029
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-09-01
PublicationDateYYYYMMDD	2021-09-01
PublicationDate_xml	– month: 09 year: 2021 text: 2021-09-01 day: 01
PublicationDecade	2020
PublicationPlace	Basel
PublicationPlace_xml	– name: Basel
PublicationTitle	Machine learning and knowledge extraction
PublicationYear	2021
Publisher	MDPI AG
Publisher_xml	– name: MDPI AG
References	ref_94 ref_137 ref_93 ref_136 ref_92 ref_139 ref_91 ref_138 ref_90 Iida (ref_31) 2014; Volume 8427 ref_131 ref_99 ref_130 ref_98 ref_133 Vinyals (ref_97) 2019; 575 ref_132 ref_96 ref_95 ref_134 ref_126 ref_125 ref_128 ref_127 ref_129 ref_120 ref_122 Mnih (ref_7) 2015; 518 ref_121 ref_124 ref_123 Bowling (ref_46) 2015; 347 (ref_9) 2010; 23 ref_72 ref_159 ref_71 ref_158 ref_70 Argall (ref_111) 2009; 57 ref_151 ref_79 ref_78 ref_153 ref_77 ref_152 ref_76 ref_155 ref_75 ref_74 ref_157 ref_73 ref_156 Dethlefs (ref_174) 2014; 4 ref_160 ref_83 ref_148 Ginsberg (ref_30) 2001; 14 ref_82 ref_147 ref_81 ref_80 ref_149 Ishii (ref_50) 2005; 59 ref_140 ref_89 ref_142 ref_88 ref_141 Renals (ref_172) 2010; 24 ref_87 ref_144 ref_86 ref_143 ref_85 ref_146 ref_84 ref_145 Williams (ref_181) 1992; 8 Fujita (ref_51) 2007; 19 Tesauro (ref_27) 1994; 6 ref_201 ref_200 ref_115 ref_114 Silver (ref_37) 2016; 529 ref_116 ref_119 Lee (ref_178) 2019; 366 ref_110 ref_113 Gao (ref_32) 2018; 10 Dethlefs (ref_173) 2011; 7 Silver (ref_39) 2018; 362 ref_104 ref_103 ref_106 ref_105 ref_108 ref_107 ref_109 Silver (ref_38) 2017; 550 ref_100 ref_101 Buro (ref_28) 1999; 14 Bellemare (ref_60) 2013; 47 ref_14 ref_13 Sutton (ref_15) 1988; 3 ref_12 ref_11 ref_10 Tsurumine (ref_135) 2019; 112 ref_19 ref_18 ref_17 Brown (ref_49) 2019; 365 ref_16 Verma (ref_154) 2018; 115 ref_25 ref_24 ref_23 ref_22 ref_21 ref_20 ref_26 Deisenroth (ref_112) 2013; 2 Schmid (ref_48) 2017; 356 Peng (ref_150) 2016; 35 Akimov (ref_67) 2019; 2479 Schaeffer (ref_29) 2007; 317 ref_58 ref_57 ref_56 ref_175 ref_55 ref_54 ref_177 ref_53 ref_176 ref_52 ref_179 ref_180 ref_182 ref_59 ref_61 ref_169 ref_69 ref_162 ref_68 ref_161 ref_164 ref_66 ref_163 ref_65 ref_166 ref_64 ref_165 ref_63 ref_168 ref_62 ref_167 Kimura (ref_40) 2020; 9 ref_171 ref_170 Block (ref_34) 2008; 35 ref_36 Peters (ref_118) 2008; 71 ref_195 ref_35 ref_194 Shao (ref_102) 2019; 3 ref_197 ref_33 ref_196 ref_199 ref_198 Watkins (ref_3) 1992; 8 Peters (ref_117) 2008; 21 ref_47 ref_184 ref_183 ref_45 ref_186 ref_44 ref_185 ref_43 ref_188 ref_42 ref_187 ref_41 ref_189 ref_1 ref_2 ref_191 ref_190 ref_193 ref_192 ref_8 ref_5 ref_4 ref_6
References_xml	– volume: 518 start-page: 529 year: 2015 ident: ref_7 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – volume: 8 start-page: 229 year: 1992 ident: ref_181 article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning publication-title: Mach. Learn. doi: 10.1007/BF00992696 – ident: ref_108 – ident: ref_132 – volume: 112 start-page: 72 year: 2019 ident: ref_135 article-title: Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation publication-title: Robot. Auton. Syst. doi: 10.1016/j.robot.2018.11.004 – ident: ref_42 – ident: ref_177 doi: 10.21437/Interspeech.2017-1060 – volume: Volume 8427 start-page: 60 year: 2014 ident: ref_31 article-title: MoHex 2.0: A Pattern-Based MCTS Hex Player publication-title: Computers and Games doi: 10.1007/978-3-319-09165-5_6 – volume: 6 start-page: 215 year: 1994 ident: ref_27 article-title: TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play publication-title: Neural Comput. doi: 10.1162/neco.1994.6.2.215 – ident: ref_87 doi: 10.1609/aiide.v15i1.5237 – ident: ref_94 – ident: ref_77 – ident: ref_176 doi: 10.1007/978-981-10-2585-3 – ident: ref_166 – ident: ref_124 doi: 10.1109/ICRA.2016.7487156 – ident: ref_125 doi: 10.15607/RSS.2018.XIV.049 – ident: ref_114 – ident: ref_4 – ident: ref_190 doi: 10.24963/ijcai.2018/572 – ident: ref_120 – ident: ref_134 doi: 10.1109/ICRA.2017.7989385 – ident: ref_163 doi: 10.1145/3209978.3210183 – ident: ref_83 – ident: ref_13 – ident: ref_103 – ident: ref_59 – ident: ref_186 doi: 10.18653/v1/D16-1127 – ident: ref_119 doi: 10.1109/ROBOT.2010.5509336 – ident: ref_53 – volume: 24 start-page: 395 year: 2010 ident: ref_172 article-title: Evaluation of a hierarchical reinforcement learning spoken dialogue system publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2009.07.001 – ident: ref_47 – volume: 529 start-page: 484 year: 2016 ident: ref_37 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 – ident: ref_14 doi: 10.1609/aaai.v32i1.11796 – volume: 575 start-page: 350 year: 2019 ident: ref_97 article-title: Grandmaster level in StarCraft II using multi-agent reinforcement learning publication-title: Nature doi: 10.1038/s41586-019-1724-z – ident: ref_25 – volume: 35 start-page: 31 year: 2008 ident: ref_34 article-title: Using Reinforcement Learning in Chess Engines publication-title: Res. Comput. Sci. – ident: ref_54 doi: 10.1109/GCCE.2013.6664846 – volume: 7 start-page: 1 year: 2011 ident: ref_173 article-title: Spatially-aware dialogue control using hierarchical reinforcement learning publication-title: Acm Trans. Speech Lang. Process. (TSLP) doi: 10.1145/1966407.1966410 – ident: ref_179 – volume: 115 start-page: 5849 year: 2018 ident: ref_154 article-title: Efficient collective swimming by harnessing vortices through deep reinforcement learning publication-title: Proc. Natl. Acad. Sci. USA doi: 10.1073/pnas.1800923115 – volume: 2 start-page: 1 year: 2013 ident: ref_112 article-title: A Survey on Policy Search for Robotics publication-title: Found. Trends Robot. – ident: ref_36 – ident: ref_70 – ident: ref_19 – ident: ref_95 – ident: ref_105 doi: 10.1109/ICIST.2018.8426160 – volume: 47 start-page: 253 year: 2013 ident: ref_60 article-title: The Arcade Learning Environment: An Evaluation Platform for General Agents publication-title: J. Artif. Intell. Res. doi: 10.1613/jair.3912 – volume: 365 start-page: 885 year: 2019 ident: ref_49 article-title: Superhuman AI for multiplayer poker publication-title: Science doi: 10.1126/science.aay2400 – ident: ref_55 – ident: ref_156 doi: 10.1109/ICRA.2018.8463144 – ident: ref_26 – ident: ref_113 doi: 10.1007/978-3-319-03194-1 – volume: 35 start-page: 1 year: 2016 ident: ref_150 article-title: Terrain-adaptive locomotion skills using deep reinforcement learning publication-title: ACM Trans. Graph. (TOG) – ident: ref_151 – ident: ref_61 – ident: ref_104 – ident: ref_98 – ident: ref_58 doi: 10.24963/ijcai.2020/764 – ident: ref_139 – ident: ref_158 doi: 10.1007/978-3-540-30082-3_18 – volume: 347 start-page: 145 year: 2015 ident: ref_46 article-title: Heads-up limit hold’em poker is solved publication-title: Science doi: 10.1126/science.1259433 – ident: ref_66 – volume: 23 start-page: 2613 year: 2010 ident: ref_9 article-title: Double Q-Learning publication-title: Adv. Neural Inf. Process. Syst. – ident: ref_184 doi: 10.18653/v1/D18-1421 – ident: ref_110 – ident: ref_121 doi: 10.1109/IROS.2011.6048825 – ident: ref_20 – ident: ref_93 – ident: ref_109 doi: 10.1609/aiide.v14i1.13033 – ident: ref_162 – ident: ref_73 doi: 10.1145/2576768.2598358 – ident: ref_137 doi: 10.1109/IROS.2016.7759581 – ident: ref_144 – ident: ref_201 doi: 10.1109/CVPR.2017.469 – ident: ref_130 doi: 10.1109/ICRA.2016.7487517 – ident: ref_82 – ident: ref_21 – ident: ref_72 doi: 10.1007/978-3-319-08864-8_25 – ident: ref_89 doi: 10.1609/aaai.v31i1.10744 – volume: 356 start-page: 508 year: 2017 ident: ref_48 article-title: DeepStack: Expert-level artificial intelligence in heads-up no-limit poker publication-title: Science doi: 10.1126/science.aam6960 – ident: ref_167 doi: 10.18653/v1/P17-4012 – ident: ref_149 doi: 10.23919/ChiCC.2018.8483478 – ident: ref_159 doi: 10.18653/v1/P18-5007 – ident: ref_122 doi: 10.1109/ICRA.2018.8463162 – ident: ref_2 – volume: 2479 start-page: 3 year: 2019 ident: ref_67 article-title: Deep Reinforcement Learning with VizDoom First-Person Shooter publication-title: CEUR Workshop Proc. – volume: 57 start-page: 469 year: 2009 ident: ref_111 article-title: A survey of robot learning from demonstration publication-title: Robot. Auton. Syst. doi: 10.1016/j.robot.2008.10.024 – ident: ref_84 doi: 10.24963/ijcai.2019/339 – ident: ref_195 – ident: ref_145 doi: 10.1109/ICRA.2018.8463189 – ident: ref_69 doi: 10.1145/1553374.1553380 – ident: ref_133 – ident: ref_43 – ident: ref_126 doi: 10.1109/ICRA.2018.8460756 – ident: ref_147 doi: 10.1109/IROS.2018.8593722 – ident: ref_140 doi: 10.15607/RSS.2019.XV.011 – ident: ref_199 doi: 10.1109/CVPR.2018.00443 – volume: 21 start-page: 682 year: 2008 ident: ref_117 article-title: Reinforcement learning of motor skills with policy gradients publication-title: Neural Netw. doi: 10.1016/j.neunet.2008.02.003 – ident: ref_99 – ident: ref_74 – ident: ref_80 – ident: ref_100 – ident: ref_68 – ident: ref_175 – ident: ref_155 doi: 10.1109/ICRA.2017.7989079 – ident: ref_16 – volume: 14 start-page: 303 year: 2001 ident: ref_30 article-title: GIB: Imperfect Information in a Computationally Challenging Game publication-title: J. Artif. Intell. Res. doi: 10.1613/jair.820 – ident: ref_131 doi: 10.1007/978-3-319-50115-4_16 – ident: ref_1 – ident: ref_146 – volume: 3 start-page: 73 year: 2019 ident: ref_102 article-title: StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning publication-title: IEEE Trans. Emerg. Top. Comput. Intell. doi: 10.1109/TETCI.2018.2823329 – ident: ref_160 doi: 10.1109/CCAA.2017.8229841 – ident: ref_141 – volume: 8 start-page: 279 year: 1992 ident: ref_3 article-title: Q Learning publication-title: Mach. Learn. doi: 10.1007/BF00992698 – ident: ref_56 – ident: ref_169 doi: 10.18653/v1/N18-1122 – ident: ref_192 doi: 10.1109/ICCV.2017.321 – ident: ref_10 – ident: ref_197 doi: 10.1109/CVPR.2017.128 – ident: ref_152 – volume: 366 start-page: 118 year: 2019 ident: ref_178 article-title: Ensemble-based deep reinforcement learning for chatbots publication-title: Neurocomputing doi: 10.1016/j.neucom.2019.08.007 – ident: ref_128 – ident: ref_62 – ident: ref_5 doi: 10.1201/9781351006620-1 – ident: ref_45 – ident: ref_88 doi: 10.1609/aaai.v32i1.11581 – ident: ref_161 doi: 10.1145/3134421.3137039 – ident: ref_193 doi: 10.24963/ijcai.2017/385 – ident: ref_196 doi: 10.1109/CVPR.2017.345 – ident: ref_187 – ident: ref_200 doi: 10.18653/v1/D17-1103 – ident: ref_188 doi: 10.18653/v1/P17-1045 – ident: ref_11 – ident: ref_86 – ident: ref_64 doi: 10.1109/CIG.2016.7860433 – ident: ref_157 – ident: ref_92 – ident: ref_129 – ident: ref_106 – ident: ref_44 – ident: ref_182 – volume: 14 start-page: 12 year: 1999 ident: ref_28 article-title: How Machines have Learned to Play Othello publication-title: IEEE Intell. Syst. – ident: ref_6 – ident: ref_78 doi: 10.1609/aaai.v31i1.10918 – ident: ref_75 – ident: ref_81 – ident: ref_33 – ident: ref_136 doi: 10.1109/ICRA40945.2020.9196659 – ident: ref_185 doi: 10.18653/v1/D17-1062 – volume: 9 start-page: 69 year: 2020 ident: ref_40 article-title: Development of AlphaZero-based Reinforcment Learning Algorithm for Solving Partially Observable Markov Decision Process (POMDP) Problem publication-title: Bull. Netw. Comput. Syst. Softw. – volume: 71 start-page: 1180 year: 2008 ident: ref_118 article-title: Natural actor-critic publication-title: Neurocomputing doi: 10.1016/j.neucom.2007.11.026 – ident: ref_71 doi: 10.1109/CIG.2018.8490423 – volume: 4 start-page: 1 year: 2014 ident: ref_174 article-title: Nonstrict hierarchical reinforcement learning for interactive systems and robots publication-title: Acm Trans. Interact. Intell. Syst. (TiiS) doi: 10.1145/2659003 – ident: ref_101 – ident: ref_22 – ident: ref_198 doi: 10.18653/v1/P18-1083 – ident: ref_168 – ident: ref_142 – ident: ref_165 – ident: ref_171 – ident: ref_90 – ident: ref_180 – ident: ref_65 doi: 10.1609/aaai.v31i1.10827 – ident: ref_194 doi: 10.1109/ICCV.2017.100 – ident: ref_116 doi: 10.1109/IRC.2019.00120 – ident: ref_35 – ident: ref_23 – volume: 362 start-page: 1140 year: 2018 ident: ref_39 article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play publication-title: Science doi: 10.1126/science.aar6404 – volume: 10 start-page: 336 year: 2018 ident: ref_32 article-title: Move Prediction Using Deep Convolutional Neural Networks in Hex publication-title: IEEE Trans. Games doi: 10.1109/TG.2017.2785042 – ident: ref_52 doi: 10.1007/978-3-540-75538-8_11 – volume: 317 start-page: 1518 year: 2007 ident: ref_29 article-title: Checkers Is Solved publication-title: Science doi: 10.1126/science.1144079 – volume: 550 start-page: 354 year: 2017 ident: ref_38 article-title: Mastering the game of Go without human knowledge publication-title: Nature doi: 10.1038/nature24270 – ident: ref_41 – volume: 59 start-page: 31 year: 2005 ident: ref_50 article-title: A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game publication-title: Mach. Learn. doi: 10.1007/s10994-005-0461-8 – ident: ref_189 doi: 10.18653/v1/E17-2077 – ident: ref_107 – ident: ref_17 – ident: ref_138 doi: 10.1109/INISTA.2019.8778209 – ident: ref_115 doi: 10.1109/IROS.2012.6386109 – ident: ref_143 doi: 10.1145/3072959.3073602 – ident: ref_24 – ident: ref_191 doi: 10.1109/IROS.2018.8593410 – ident: ref_183 doi: 10.1609/aaai.v31i1.10804 – volume: 3 start-page: 9 year: 1988 ident: ref_15 article-title: Learning to Predict by the Methods of Temporal Differences publication-title: Mach. Learn. doi: 10.1007/BF00115009 – ident: ref_8 doi: 10.1609/aaai.v30i1.10295 – ident: ref_153 – ident: ref_63 – ident: ref_18 – ident: ref_96 – ident: ref_127 doi: 10.1109/ICRA.2019.8794102 – ident: ref_79 – ident: ref_123 doi: 10.1109/ICRA.2014.6907864 – ident: ref_148 doi: 10.15607/RSS.2018.XIV.010 – ident: ref_164 doi: 10.18653/v1/D18-1397 – volume: 19 start-page: 3051 year: 2007 ident: ref_51 article-title: Model-based Reinforcement Learning for Partially Observable Games with Sampling-based State Estimation publication-title: Neural Comput. doi: 10.1162/neco.2007.19.11.3051 – ident: ref_76 doi: 10.23919/ChiCC.2018.8482790 – ident: ref_12 – ident: ref_85 – ident: ref_170 – ident: ref_91 – ident: ref_57
SSID	ssj0002794104
Score	2.4057546
SecondaryResourceType	review_article
Snippet	The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially...
SourceID	doaj proquest crossref
SourceType	Open Website Aggregation Database Enrichment Source Index Database
StartPage	554
SubjectTerms	Algorithms Deep learning deep reinforcement learning Dynamic programming Expected utility Games Machine learning Markov analysis Markov decision process Markov processes Natural language processing partially observable Markov decision process Probability Probability distribution reinforcement learning Robotics Transportation applications
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NbtRADB6hnnpBoBaxUJAPPQAi6s5fsuG2tCwVou2qUKm3aCbjQUBIqu6CxI2H6BP2SbAn2WorkLiQ48SKJ7Fjj8fjz0Ls2pJ8Oi0csjFalZkgXVbGILN6ksdSehVN4Nrho-P88My8O7fna62--ExYDw_cf7g94_gip2ajNk5yRbFEY3VEdEj-k60v-by1YOpLSqeVhgKNviBPU1y_9819RTLXnIW65YISUv8fhjh5l9k9cXdYFsK0n859cQfbLXFNazryCTDt8_QL-NzCAeIFnGLCO63T1h4MEKmfYLqWjAa6Dx-6hvcLYM6v6ZrmJ5z4tAvrGwSu0ul-0AP7JjswlAwQm2fzk6OD-XMe4WYzi1fpASCvf13NuHCk7wewANeG2zxpfm_51O1LOO18xwDQiebYJXAPeD_sja5Y0dS2xdnszcf9w2zoyZDVOjfLrODi1MIzED0FkhMjfZDaGacMquD9RIeoXG5r7VURo7dFzJGCEO-xyJ11qB-IjbZr8aGAMZp6UqgYChOMROU5kzxGOa7rIrfajsSLlZiqegAs574ZTUWBC8u0WpPpSOzeEF_0OB1_J3vN8r4hYXDtNEAqVw0qV_1L5UZiZ6Ut1fDHLyoO3IwpyYI--h88HotNxadn0mm2HbGxvPyOT2j5s_RPk6b_BvXNCHk priority: 102 providerName: Directory of Open Access Journals
Title	Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing
URI	https://www.proquest.com/docview/2576449070 https://doaj.org/article/4aaaaa9105f34a119951e453feeae065
Volume	3
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Pb9MwFLdYd-GCQIAojMqHHQARLU7sJOWCOtZuQrSrCpN2i-z4eUKEpiwFiQviQ_AJ90l4z3HLJhA55GBbtqX3_Pz--P0eY_tqiHc6Kg5RDCqJpBU6GjoroqrI3FCYxElLucPTWXZyJt-eq_PgcGvDs8qNTPSC2jYV-cgPSDGWEk25-PXqS0RVoyi6Gkpo7LBdFMGF6rHdw_Fsvth6WRJkNzQ4usS8FO37g8_6E6DYpmjUjavII_b_JZD9LTO5y-4E9ZCPOnreY7dgeZ9doW6HdwMfdfH6ln9c8iOAFV-Axz2tvIuPB6jUCz66FpTm2M_fNzX5Dfic-ETX9Xd-arw31tTAKVun-YYTdsV2eEgdwGWezU-nR_Pn1EJFZ9pXfgIurn7-mlACSVcXoOV6aW-uifs7pte3L_miMQ0BQfsxM-1BPvi74CPdLIVbe8DOJuMPb06iUJshqtJMrqOcklRzQ4D0aFAWUhgrUi11IiGxxhSpdYnOVJWaJHfOqNxlgMaIMZBnWmlIH7LeslnCI8ZjkFWRJ87m0koBiaGIcgwirqo8U6nqsxcbMpVVAC6n-hl1iQYM0bS8RtM-298OXnV4Hf8edkj03g4hkG3f0FxelOHMllLTh_qUcqnUgpLZBUiVOgANqLr12d6GW8pw8tvyD58-_n_3E3Y7ofcx_r3aHuutL7_CU1Rw1mbAdorJ8SDw8sC7CfA__TH-DWqRBBQ
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Pb9MwFLfGdoALAgGiMMCHIQEiWuI4SYOEUEdXOtZ2Vdmk3TI7fpnQsqQsBbQbH4LPwYfaJ-E9JymbQNyWo2PZlt6z3-_9Z2wjiFGmI3BwXAiEI42nnDgznpN2wyz2tMikodzh8SQcHsiPh8HhCvvV5sJQWGX7JtqH2pQp2cg3CRhLiaqc-27-xaGuUeRdbVto1GyxC-ffUWWr3u70kb7PhRhs778fOk1XASf1Q7lwIkqvjDSVUkdVqCs9bTxfSSUkCKN11zeZUGGQ-lpEWaaDKAsBYbTWEIUqUODjujfYGsKMGG_R2tb2ZDpbWnUEsjcqOHUioO_H7uapOgEUE-T9uiL6bIeAvwSAlWqDO-x2A0d5r-afu2wFinvsArEkyiLeq-MDKv654H2AOZ-BrbOaWpMib0qzHvPeJSc4x__8U5mTnYJPiS9Vnp_zPW2tvzoHTtlB5TdcsG7uw5tUBdzmxXRv3J--pBFqclO9sQtw7-LHzwElrNR9CCquCnN1TzzfB4r2fc1npS6p8LSdM1G2qAgfNTbZdis82n12cC1Ue8BWi7KAh4y7INNuJDITSSM9EJo82C54bppGYeAHHfaqJVOSNoXSqV9HnqDCRDRNLtG0wzaWk-d1fZB_T9siei-nUFFvO1CeHSfNG5FIRR_ityDzpfIoed4DGfgZgAKEih223nJL0rw0VfLnXjz6_-9n7OZwfzxKRjuT3cfslqDYHBsrt85WF2df4QmCq4V-2nA0Z0fXfYl-A56_PhE
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtQw0CpbCXFBIEAsFPChSICINnGcxyIhtGW7tLTdRguVegt2Mq6qhmRpFlBvfARfw-f0S5hxkqUViFtzdEa2pRnP-8HYejBEmY6Kg-NCIByZe8oZmtxzsjg0Q08LI3OqHd6bhlsH8v1hcLjCfnW1MJRW2fFEy6jzKiMf-YAUYynRlHMHpk2LSMaTN_MvDk2QokhrN06jIZEdOPuO5lv9enuMuH4qxGTz49stp50w4GR-KBdORKWWkaa26mgWxdLTuecrqYQEkWsd-7kRKgwyX4vIGB1EJgRUqbWGKFSBAh_3vcZWI5SKcY-tbmxOk9nSwyOQ1NHYaYoCfX_oDj6rE0CRQZGwS2LQTgv4SxhYCTe5xW62qikfNbR0m61AeYedo16JcomPmlyBmh-XfAww5zOwPVcz617kbZvWIz66EBDn-J9_qAryWfCEaFQVxRnf19YTrAvgVClUfcMNm0E_vC1bwGOeJft74-Q5rdDAm_qV3YB75z9-Tqh4pZlJUHNV5pfPxPu9o8zfl3xW6YqaUFuYqbINRvhu65_tjsKr3WUHV4K1e6xXViXcZ9wFmcWRMHkkc-mB0BTNdsFzsywKAz_osxcdmtKsbZpOszuKFI0nwml6Aad9tr4Enje9Qv4NtkH4XoJQg2-7UJ0epS2_SKWiD3W5wPhSeVRI74EMfAOgANXGPlvrqCVtuU6d_nkjD_7_-wm7jo8n3d2e7jxkNwSl6di0uTXWW5x-hUeoZy3045agOft01W_oN97yQj0
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recent+Advances+in+Deep+Reinforcement+Learning+Applications+for+Solving+Partially+Observable+Markov+Decision+Processes+%28POMDP%29+Problems%3A+Part+1%E2%80%94Fundamentals+and+Applications+in+Games%2C+Robotics+and+Natural+Language+Processing&rft.jtitle=Machine+learning+and+knowledge+extraction&rft.au=Xiang%2C+Xuanchen&rft.au=Foo%2C+Simon&rft.date=2021-09-01&rft.issn=2504-4990&rft.eissn=2504-4990&rft.volume=3&rft.issue=3&rft.spage=554&rft.epage=581&rft_id=info:doi/10.3390%2Fmake3030029&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_make3030029
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-4990&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-4990&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-4990&client=summon