Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optim...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in neurorobotics Vol. 13; p. 103
Main Authors Ohnishi, Shota, Uchibe, Eiji, Yamaguchi, Yotaro, Nakanishi, Kosuke, Yasui, Yuji, Ishii, Shin
Format Journal Article
LanguageEnglish
Published Switzerland Frontiers Research Foundation 10.12.2019
Frontiers Media S.A
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods.
AbstractList A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we propose Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional component to those methods.
A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods.
A deep Q network (DQN) (Mnih et al., 2013 ) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods.
Author Yasui, Yuji
Yamaguchi, Yotaro
Ohnishi, Shota
Ishii, Shin
Uchibe, Eiji
Nakanishi, Kosuke
AuthorAffiliation 3 Department of Systems Science, Graduate School of Informatics, Kyoto University , Kyoto , Japan
4 Honda R&D Co., Ltd. , Saitama , Japan
1 Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd. , Kyoto , Japan
2 ATR Computational Neuroscience Laboratories , Kyoto , Japan
AuthorAffiliation_xml – name: 3 Department of Systems Science, Graduate School of Informatics, Kyoto University , Kyoto , Japan
– name: 2 ATR Computational Neuroscience Laboratories , Kyoto , Japan
– name: 1 Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd. , Kyoto , Japan
– name: 4 Honda R&D Co., Ltd. , Saitama , Japan
Author_xml – sequence: 1
  givenname: Shota
  surname: Ohnishi
  fullname: Ohnishi, Shota
  organization: Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd., Kyoto, Japan
– sequence: 2
  givenname: Eiji
  surname: Uchibe
  fullname: Uchibe, Eiji
  organization: ATR Computational Neuroscience Laboratories, Kyoto, Japan
– sequence: 3
  givenname: Yotaro
  surname: Yamaguchi
  fullname: Yamaguchi, Yotaro
  organization: Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan
– sequence: 4
  givenname: Kosuke
  surname: Nakanishi
  fullname: Nakanishi, Kosuke
  organization: Honda R&D Co., Ltd., Saitama, Japan
– sequence: 5
  givenname: Yuji
  surname: Yasui
  fullname: Yasui, Yuji
  organization: Honda R&D Co., Ltd., Saitama, Japan
– sequence: 6
  givenname: Shin
  surname: Ishii
  fullname: Ishii, Shin
  organization: Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31920613$$D View this record in MEDLINE/PubMed
BookMark eNpdkU1rGzEQhkVJaT7ae0_F0Esv60qalVa6FIKbpgFDKLRnIe2OnDVraSvtFvLvK9tpcIoOI2beeTSa95KchRiQkPeMLgGU_uyDi9OSU6aXlDIKr8gFk5JXgjN1dnI_J5c5bymVXAr1hpwD05xKBhfkZhVDnpLtA3aLr4jj4ke1RptCHzaL22S72Q7D4-J6HFO07cM-e5-6Ptj0eKJ8S157O2R89xSvyK9vNz9X36v1_e3d6npdtULIqULhnFdOKe44t75xzLcKJOiGCqUcSmulpI3joLHxusOWUy2Y5cCkk97BFbk7crtot2ZM_a7MYaLtzSER08bYNPXtgMZpcMKVJ6Rq6kJ1DVLXIaBAz6hoCuvLkTXOboddi6GsYXgBfVkJ_YPZxD9GalYruQd8egKk-HvGPJldn1scBhswztlwAMkF1DUv0o__SbdxTqGsqqjKqZkAVlT0qGpTzDmhfx6GUbP32xz8Nnu_zcHv0vLh9BPPDf8Mhr8WV6hm
CitedBy_id crossref_primary_10_3390_en16041608
crossref_primary_10_1016_j_enconman_2022_116179
crossref_primary_10_1109_ACCESS_2022_3195530
crossref_primary_10_3390_sym13122411
crossref_primary_10_3389_fnbot_2020_00063
crossref_primary_10_1002_cjce_24508
crossref_primary_10_1007_s11276_022_02926_w
crossref_primary_10_3390_s22187004
crossref_primary_10_1016_j_ijhydene_2022_06_088
crossref_primary_10_1177_02783649231185165
crossref_primary_10_3389_fnbot_2023_1093718
crossref_primary_10_1155_2022_1478048
crossref_primary_10_1002_rnc_6457
crossref_primary_10_1049_stg2_12133
crossref_primary_10_3390_jsan11030045
crossref_primary_10_3390_en15207467
crossref_primary_10_3103_S105261882107013X
crossref_primary_10_1007_s42835_023_01730_6
crossref_primary_10_1016_j_segan_2023_101109
crossref_primary_10_1109_ACCESS_2023_3298602
crossref_primary_10_3390_automation4030013
crossref_primary_10_1016_j_est_2023_106784
crossref_primary_10_1016_j_oceaneng_2024_118638
crossref_primary_10_1016_j_oceaneng_2022_111802
crossref_primary_10_1155_2022_7089057
crossref_primary_10_1007_s13198_021_01219_3
crossref_primary_10_3389_fsurg_2022_863633
Cites_doi 10.1609/aaai.v30i1.10303
10.1007/BF00992698
10.1016/j.neunet.2016.07.013
10.3389/fnbot.2018.00032
10.1038/nature16961
10.1038/nature14236
10.1016/j.robot.2018.11.004
10.1038/nature24270
10.24963/ijcai.2019/379
10.1109/9.580874
ContentType Journal Article
Copyright Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii.
2019. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii
Copyright_xml – notice: Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii.
– notice: 2019. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii
DBID NPM
AAYXX
CITATION
3V.
7XB
88I
8FE
8FH
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M2P
M7P
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
Q9U
7X8
5PM
DOA
DOI 10.3389/fnbot.2019.00103
DatabaseName PubMed
CrossRef
ProQuest Central (Corporate)
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
Biological Sciences
ProQuest Science Journals
Biological Science Database
Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle PubMed
CrossRef
Publicly Available Content Database
ProQuest Science Journals (Alumni Edition)
ProQuest Central Student
ProQuest Biological Science Collection
ProQuest Central Basic
ProQuest Central Essentials
ProQuest Science Journals
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest One Academic
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList Publicly Available Content Database
PubMed


Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1662-5218
EndPage 103
ExternalDocumentID oai_doaj_org_article_b93b5b7b16874660b7e0bde3e5ef1057
10_3389_fnbot_2019_00103
31920613
Genre Journal Article
GeographicLocations Japan
GeographicLocations_xml – name: Japan
GrantInformation_xml – fundername: New Energy and Industrial Technology Development Organization
– fundername: Japan Society for the Promotion of Science
GroupedDBID ---
29H
2WC
53G
5GY
5VS
88I
8FE
8FH
9T4
AAFWJ
AAKPC
ABUWG
ACGFS
ACXDI
ADBBV
ADDVE
ADRAZ
AEGXH
AENEX
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AOIJS
ARCSS
AZQEC
BAWUL
BBNVY
BCNDV
BENPR
BHPHI
BPHCQ
C1A
CCPQU
CS3
DIK
DWQXO
E3Z
F5P
GNUQQ
GROUPED_DOAJ
GX1
HCIFZ
HYE
IAO
IEA
IHR
IPNFZ
ISR
ITC
KQ8
LK8
M2P
M48
M7P
M~E
NPM
O5R
O5S
OK1
PIMPY
PQQKQ
PROAC
RIG
RNS
RPM
TR2
AAYXX
CITATION
PGMZT
3V.
7XB
8FK
PQEST
PQUKI
PRINS
Q9U
7X8
5PM
ID FETCH-LOGICAL-c556t-e5bbf8b882b22af7b1fc8363970588be6aa6607b239e7f9dec20951a2316b6fb3
IEDL.DBID RPM
ISSN 1662-5218
IngestDate Thu Sep 05 15:47:21 EDT 2024
Tue Sep 17 21:18:35 EDT 2024
Sat Aug 17 02:51:20 EDT 2024
Tue Sep 24 19:10:02 EDT 2024
Fri Aug 23 02:06:50 EDT 2024
Thu May 23 23:47:22 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords constrained reinforcement learning
target network
deep reinforcement learning
learning stabilization
regularization
deep Q network
Language English
License Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c556t-e5bbf8b882b22af7b1fc8363970588be6aa6607b239e7f9dec20951a2316b6fb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Edited by: Hong Qiao, University of Chinese Academy of Sciences, China
Reviewed by: Jiwen Lu, Tsinghua University, China; David Haim Silver, Independent Researcher, Haifa, Israel; Timothy P. Lillicrap, Google, United States
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914867/
PMID 31920613
PQID 2323241531
PQPubID 4424403
PageCount 1
ParticipantIDs doaj_primary_oai_doaj_org_article_b93b5b7b16874660b7e0bde3e5ef1057
pubmedcentral_primary_oai_pubmedcentral_nih_gov_6914867
proquest_miscellaneous_2336253442
proquest_journals_2323241531
crossref_primary_10_3389_fnbot_2019_00103
pubmed_primary_31920613
PublicationCentury 2000
PublicationDate 2019-12-10
PublicationDateYYYYMMDD 2019-12-10
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-12-10
  day: 10
PublicationDecade 2010
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Lausanne
PublicationTitle Frontiers in neurorobotics
PublicationTitleAlternate Front Neurorobot
PublicationYear 2019
Publisher Frontiers Research Foundation
Frontiers Media S.A
Publisher_xml – name: Frontiers Research Foundation
– name: Frontiers Media S.A
References Baird (B5) 1995
Kapturowski (B21) 2019
Ziebart (B48) 2008
Fortunato (B10) 2018
Silver (B36) 2016; 529
Anschel (B3) 2017
Mnih (B28) 2016
Mnih (B30) 2015; 518
Gu (B13) 2016
Kim (B23) 2019
Azar (B4) 2011
Plappert (B31) 2018
Yang (B46) 2019
Wang (B44) 2016
van Hasselt (B42) 2016
Zhang (B47) 2017
Achiam (B1) 2019
Hessel (B18) 2017
Schaul (B35) 2015
Mnih (B29) 2013
Graves (B12) 2013
Horgan (B19) 2018
Pohlen (B32) 2018
Andrychowicz (B2) 2017
Durugkar (B8) 2017
B34
He (B15) 2017
Lin (B27) 1993
Bellemare (B6) 2016
B16
Hernandez-Garcia (B17) 2019
Brockman (B7) 2016
Elfwing (B9) 2016; 84
Karimpanal (B22) 2018; 21
Lillicrap (B26) 2016
van Hasselt (B43) 2019
Silver (B37) 2017; 550
Kozuno (B24) 2019
Sutton (B38) 1998
van Hasselt (B41) 2010
Fujimoto (B11) 2018
Tsurumine (B40) 2019; 112
Watkins (B45) 1992; 8
Levine (B25) 2016
Kahn (B20) 2017
Tsitsiklis (B39) 1997; 42
Haarnoja (B14) 2017
Riedmiller (B33) 2005
References_xml – volume-title: Proceedings of the 5th Inernational Conference on Learning Representation
  year: 2017
  ident: B15
  article-title: “Learning to play in a day: faster deep reinforcement learning by optimality tightening,”
  contributor:
    fullname: He
– volume-title: Proceedings of the 30th AAAI Conference on Artificial Intelligence
  year: 2016
  ident: B6
  article-title: “Increasing the action gap: New operators for reinforcement learning,”
  doi: 10.1609/aaai.v30i1.10303
  contributor:
    fullname: Bellemare
– start-page: 3455
  volume-title: Proceedings of the 26th International Joint Conference on Artificial Intelligence
  year: 2017
  ident: B47
  article-title: “Weighted double Q-learning,”
  contributor:
    fullname: Zhang
– volume: 8
  start-page: 279
  year: 1992
  ident: B45
  article-title: Q-learning
  publication-title: Mach. Learn.
  doi: 10.1007/BF00992698
  contributor:
    fullname: Watkins
– volume-title: Advances in Neural Information Processing Systems
  year: 2019
  ident: B43
  article-title: “When to use parametric models in reinforcement learning?”
  contributor:
    fullname: van Hasselt
– volume-title: Proceedings of the 4th International Conference on Learning Representations
  year: 2016
  ident: B26
  article-title: “Continuous control with deep reinforcement learning,”
  contributor:
    fullname: Lillicrap
– start-page: 2613
  volume-title: Advances in Neural Information Processing Systems, Vol. 23
  year: 2010
  ident: B41
  article-title: “Double Q-learning,”
  contributor:
    fullname: van Hasselt
– volume: 84
  start-page: 17
  year: 2016
  ident: B9
  article-title: From free energy to expected energy: improving energy-based value function approximation in reinforcement learning
  publication-title: Neural Netw.
  doi: 10.1016/j.neunet.2016.07.013
  contributor:
    fullname: Elfwing
– year: 2019
  ident: B17
  article-title: Understanding multi-step deep reinforcement learning: a systematic study of the DQN target
  publication-title: arXiv[Preprint].arXiv:1901.07510.
  contributor:
    fullname: Hernandez-Garcia
– start-page: 317
  volume-title: Proceedings of the 16th European Conference on Machine Learning
  year: 2005
  ident: B33
  article-title: “Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method,”
  contributor:
    fullname: Riedmiller
– start-page: 2094
  volume-title: Proceedings of the 30th AAAI Conference on Artificial Intelligence
  year: 2016
  ident: B42
  article-title: “Deep reinforcement learning with double Q-learning,”
  contributor:
    fullname: van Hasselt
– ident: B16
– year: 2018
  ident: B32
  article-title: Observe and look further: achieving consistent performance on Atari
  publication-title: arXiv[Preprint].arXiv:1805.11593.
  contributor:
    fullname: Pohlen
– volume-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence
  year: 2017
  ident: B18
  article-title: “Rainbow: combining improvements in deep reinforcement learning,”
  contributor:
    fullname: Hessel
– start-page: 176
  volume-title: Proceedings of the 34th International Conference on Machine Learning
  year: 2017
  ident: B3
  article-title: “Averaged-DQN: variance reduction and stabilization for deep reinforcement learning,”
  contributor:
    fullname: Anschel
– volume: 21
  start-page: 32
  year: 2018
  ident: B22
  article-title: Experience replay using transition sequences
  publication-title: Front. Neurorobot.
  doi: 10.3389/fnbot.2018.00032
  contributor:
    fullname: Karimpanal
– volume-title: Reinforcement Learning for Robots Using Neural Networks.
  year: 1993
  ident: B27
  contributor:
    fullname: Lin
– start-page: 2995
  volume-title: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics
  year: 2019
  ident: B24
  article-title: “Theoretical analysis of efficiency and robustness of softmax and gap-increasing operators in reinforcement learning,”
  contributor:
    fullname: Kozuno
– year: 2013
  ident: B29
  article-title: Playing Atari with deep reinforcement learning
  publication-title: arXiv[Preprint].arXiv:1312.5602.
  contributor:
    fullname: Mnih
– volume: 529
  start-page: 484
  year: 2016
  ident: B36
  article-title: Mastering the game of Go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
  contributor:
    fullname: Silver
– year: 2019
  ident: B46
  article-title: A theoretical analysis of deep Q-learning
  publication-title: arXiv[Preprint].arXiv:1901.00137.
  contributor:
    fullname: Yang
– start-page: 173
  volume-title: Proceedings of International Symposium on Experimental Robotics
  year: 2016
  ident: B25
  article-title: “Learning hand-eye coordination for robotic grasping with large-scale data collection,”
  contributor:
    fullname: Levine
– volume-title: Proceedings of the 4th International Conference on Learning Representations
  year: 2015
  ident: B35
  article-title: “Prioritized experience replay,”
  contributor:
    fullname: Schaul
– start-page: 5048
  volume-title: Advances in Neural Information Processing Systems, Vol. 30
  year: 2017
  ident: B2
  article-title: “Hindsight experience replay,”
  contributor:
    fullname: Andrychowicz
– year: 2016
  ident: B7
  article-title: OpenAI gym
  publication-title: arXiv[Preprint].arXiv:1606.01540.
  contributor:
    fullname: Brockman
– volume-title: Proceedings of the 7th International Conference on Learning Representations
  year: 2019
  ident: B21
  article-title: “Recurrent experience replay in distributed reinforcement learning,”
  contributor:
    fullname: Kapturowski
– volume-title: Proceedings of the 35th International Conference on Machine Learning
  year: 2018
  ident: B11
  article-title: “Addressing function approximation error in actor-critic methods,”
  contributor:
    fullname: Fujimoto
– volume: 518
  start-page: 529
  year: 2015
  ident: B30
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
  contributor:
    fullname: Mnih
– start-page: 3342
  volume-title: Proceedings of IEEE International Conference on Robotics and Automation
  year: 2017
  ident: B20
  article-title: “PLATO: policy learning using adaptive trajectory optimization,”
  contributor:
    fullname: Kahn
– volume: 112
  start-page: 72
  year: 2019
  ident: B40
  article-title: Deep reinforcement learning with smooth policy update: application to robotic cloth manipulation
  publication-title: Robot. Auton. Syst.
  doi: 10.1016/j.robot.2018.11.004
  contributor:
    fullname: Tsurumine
– start-page: 1352
  volume-title: Proceedings of the 34th International Conference on Machine Learning
  year: 2017
  ident: B14
  article-title: “Reinforcement learning with deep energy-based policies,”
  contributor:
    fullname: Haarnoja
– volume-title: Proceedings of the 6th Inernational Conference on Learning Representation
  year: 2018
  ident: B10
  article-title: “Noisy networks for exploration,”
  contributor:
    fullname: Fortunato
– volume-title: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series).
  year: 1998
  ident: B38
  contributor:
    fullname: Sutton
– volume-title: Proceedings of the 6th International Conference on Learning Representations
  year: 2018
  ident: B19
  article-title: “Distributed prioritized experience replay,”
  contributor:
    fullname: Horgan
– start-page: 1928
  volume-title: Proceedings of the 33rd International Conference on Machine Learning
  year: 2016
  ident: B28
  article-title: “Asynchronous methods for deep reinforcement learning,”
  contributor:
    fullname: Mnih
– volume-title: Proceedings of the 6th Inernational Conference on Learning Representation
  year: 2018
  ident: B31
  article-title: “Parameter space noise for exploration,”
  contributor:
    fullname: Plappert
– volume-title: Proceedings of the 33rd International Conference on Machine Learning
  year: 2016
  ident: B44
  article-title: “Dueling network architectures for deep reinforcement learning,”
  contributor:
    fullname: Wang
– start-page: 30
  volume-title: Proceedings of the 12th International Conference on Machine Learning
  year: 1995
  ident: B5
  article-title: “Residual algorithms: reinforcement learning with function approximation,”
  contributor:
    fullname: Baird
– start-page: 1433
  volume-title: Proceedings of the 23rd AAAI Conference on Artificial Intelligence
  year: 2008
  ident: B48
  article-title: “Maximum entropy inverse reinforcement learning,”
  contributor:
    fullname: Ziebart
– start-page: 2411
  volume-title: Advances in Neural Information Processing Systems, Vol. 24
  year: 2011
  ident: B4
  article-title: “Speedy Qlearning,”
  contributor:
    fullname: Azar
– start-page: 2829
  volume-title: Proceedings of the 33rd International Conference on Machine Learning
  year: 2016
  ident: B13
  article-title: “Continuous deep Q-learning with model-based acceleration,”
  contributor:
    fullname: Gu
– volume-title: Proceedings of the Deep Reinforcement Learning Symposium, NIPS 2017
  year: 2017
  ident: B8
  article-title: “TD learning with constrained gradients,”
  contributor:
    fullname: Durugkar
– volume: 550
  start-page: 354
  year: 2017
  ident: B37
  article-title: Mastering the game of Go without human knowledge
  publication-title: Nature
  doi: 10.1038/nature24270
  contributor:
    fullname: Silver
– year: 2013
  ident: B12
  article-title: Generating sequences with recurrent neural networks
  publication-title: arXiv[Preprint].arXiv:1308.0850.
  contributor:
    fullname: Graves
– ident: B34
– volume-title: arXiv[Prepront].arXiv:1903.08894.
  year: 2019
  ident: B1
  article-title: Towards characterizing divergence in deep Q-learning
  contributor:
    fullname: Achiam
– volume-title: Proceedings of the 28th International Joint Conference on Artificial Intelligence
  year: 2019
  ident: B23
  article-title: “Deepmellow: removing the need for a target network in deep Q-learning,”
  doi: 10.24963/ijcai.2019/379
  contributor:
    fullname: Kim
– volume: 42
  start-page: 674
  year: 1997
  ident: B39
  article-title: Analysis of temporal-diffference learning with function approximation
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/9.580874
  contributor:
    fullname: Tsitsiklis
SSID ssj0062658
Score 2.3851576
Snippet A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses...
A deep Q network (DQN) (Mnih et al., 2013 ) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function...
SourceID doaj
pubmedcentral
proquest
crossref
pubmed
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
StartPage 103
SubjectTerms Algorithms
Artificial intelligence
constrained reinforcement learning
Deep learning
deep Q network
deep reinforcement learning
Informatics
Information processing
International conferences
Learning
learning stabilization
Machine learning
Neural networks
Neuroscience
regularization
Robots
Systems science
target network
Teaching methods
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwEB4hTvSAgLYQXkolLj1ExPEzR55FlQqqVCRuViYZt5VQFsFy4N_jcbKrXYTUS6-2D5Nv_PgmHn8DcFQaF4QzWKDCqlAo28IpbAvRaVSBpOiQf-j_uDZXt-r7nb5bKPXFOWGDPPAA3DHWEjVaFMZZZUyJlkrsSJKmwDVq0-4r9CyYGvbgyNK1Gy4lYwhWH4ceJ5w4KZI45axA1ngIJa3-9wjm2zzJhYPncgPWR8aYnwyWbsIK9VvwYUFH8CNccNnNVOyBuvyc6CH_WYzCqb_zb48Nv7e6f8lPRv1wbr2JE4Nf4i6M_AS3lxe_zq6KsT5C0WptpgVpxOAwcmSsqiZEjELrJN_Uldo5JNM0ETCLlazJhrqjtmJC1URKZ9AElJ9htZ_0tAN5iZVurWxt3XVKl9QIgdhS0HUMl-oOM_g6A8w_DDIYPoYPDK5P4HoG1ydwMzhlROfjWMA6NUS3-tGt_l9uzWB_5g8_rqonH9mfZMYhRQZf5t1xPfAlR9PT5JnHxCNZS6WqDLYH980tidtNxfwlA7vk2CVTl3v6v3-S5rZhIIzd_R_ftgdrjBYnxYhyH1anj890EKnNFA_TLH4FWob4ig
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB6V9gIHxLuBgoLEhUO0cfyIc6pa2LJCojxEpd6ijDNuK6Fku90e-Pf1ZJ1lFyGutg-jb-yZz57xDMC73FgvrMEMFRaZQukyq9BlotWoPEnRIj_ofzk1szP1-Vyf78Bs_AvDaZWjTRwMdds7fiOfBM8v2dtIMWmQXwHccnI4v864fxTHWWMzjXuwVwjFAdu94-nptx-jVQ68XdtVmDJcyqqJ77DnVEoxlKscW2ZFtzRU7_8X5fw7c3LDFZ08goeRQ6ZHK6U_hh3qnsCDjcqCT2HKjTiH9g_Uph-J5un3LJZSvUg_LRr-gfXrd3oUK4rz6NewVfhv7sbKZ3B2Mv35YZbFjgmZ09osM9KI3mJgzVgUjS9ReGclx-5ybS2SaRpj8hILWVHpq5ZcwRSrCSTPoPEon8Nu13e0D2mOhXaldGXVtkrn1AiB6MjrKlygqhYTeD8CVs9XhTHqcKFgcOsB3JrBrQdwEzhmRNfruKT1MNAvLup4QmqsJGoMIhtbqiAllpRjS5I0eW5GnMDBqI86nrOb-s-uSODtejqcEA57NB31t7wmOGktlSoSeLFS31qSYIAKZjQJlFuK3RJ1e6a7uhyqcBsGwpQv_y_WK7jPOHACjMgPYHe5uKXXgcYs8U3coXfS2PTs
  priority: 102
  providerName: ProQuest
– databaseName: Scholars Portal Journals: Open Access
  dbid: M48
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2hcoED4ptAi4LEhUMg_nYOVVWgpUIqCImVerMyybggVdmybCX67-vxZpddtBeOsa3Ieh5n3mTsNwCva-uj8BYr1CgrjaqrvMauEr1BHUmJHvmH_ukXezLRn8_M2d_r0SOAv7eGdlxPajK7ePvn1_VB2vD7HHEmf_suDjjlY5EiS0-y9OdtqZVmez_Vq5xCYu65WqewlsMv4RdJy61v2HBSWct_GwH99xzlmmM6vg_3RkZZHi5M4AHcouEh3F3TGXwER1yWMxeDoL78SHRZfqtGYdXz8tOs5ftYF9fl4agvzq1fk-HwTd21kY9hcnz0_cNJNdZPqDpj7Lwigxg9Jg6NUrbRoYidV5zJq433SLZtra0dStWQi01PnWTC1SbKZ9FGVE9gZ5gO9AzKGqXpnOpc0_fa1NQKgdhRNE0Kp5oeC3izBCxcLmQyQgovGNyQwQ0MbsjgFvCeEV2NY4Hr3DCdnYdxvwRsFBpMU7be6TRLdFRjT4oMRS5NXMDucj3C0mhCYoeKGYkSBbxadaf9wkmQdqDpFY9JLtsorWUBTxfLt5pJ-hxJ5jcFuI2F3ZjqZs_w80fW5LYMhHXP_wOHF3CHH_hsjKh3YWc-u6K9xHDm-DIb7g1Pt_l3
  priority: 102
  providerName: Scholars Portal
Title Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
URI https://www.ncbi.nlm.nih.gov/pubmed/31920613
https://www.proquest.com/docview/2323241531/abstract/
https://search.proquest.com/docview/2336253442
https://pubmed.ncbi.nlm.nih.gov/PMC6914867
https://doaj.org/article/b93b5b7b16874660b7e0bde3e5ef1057
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV09b9swED0k6dIORb-rNjVUoEsHxaL4qTFJnQQFnKZFA3gTdBSZBEhkw3WG_PvwKMmwi05dOFAUcHh3FB_F4zuAL7kynhmFGQosMoHcZkagzVgjUXjHWYP0Q396rs4uxfeZnO2AHO7CxKR9izcH7e3dQXtzHXMrF3d2POSJjS-mx6pkJBQ33oVdzfmwRe8-v4GgS9OdR4bdVzn2Lc4pZ5JFXcqc6uaEqCu6ggYbS1FU7P8Xzfw7W3Jj-Tl5Ac973pgedva9hB3XvoJnG2qCr2FCxTdjyQfXpN-cW6Q_s14-9So9XdZ06-r2IT3sVcSp90cID7qPuzHyDVyeTH4fn2V9lYTMSqlWmZOI3mBgylgUtdfIvDWczutyaQw6VddK5RoLXjrty8bZgmhVHYidQuWRv4W9dt6695DmWEirudVl0wiZu5oxROu8JLzLBhP4OgBWLToxjCpsIgjnKuJcEc5VxDmBI0J0PY5krGPHfHlV9c6ssOQoMZisjBbBStQux8ZxJ52nAsQJ7A_-qPq59acKHJAT7-Asgc_rx2FW0FFH3br5PY0JC7PkQhQJvOvct7ZkcH8CesuxW6ZuPwmBGJW3-8D78N9vfoSnBBHlw7B8H_ZWy3v3KbCaFY7gydHk_OLXKP4VCO3pjIV2KswoxvcjKgv9Sg
link.rule.ids 230,315,733,786,790,870,891,2115,21416,24346,27957,27958,33779,33780,43840,53827,53829,74659
linkProvider National Library of Medicine
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB7B9gAcEM8SaCFIXDhETeJnTqiPLQu0y0Ot1JuVceyChJJluz3w7-vJOssuQlxtH0bf2PPw2N8AvMml9oWWmCHHMuPIbKY52qxoBHLvWNEgXeifTuXknH-8EBfxwu0qPqscbGJvqJvO0h35XvD8jLwNK97NfmXUNYqqq7GFxm3Y4iykKiPYOhhPv3wbbHGI1oVeFidDKlbt-RY7ekBZ9CSVQ6Os6Ix6zv5_BZp_v5dcc0DHD-B-jBzT_aWqH8It1z6Ce2t8go9hTO03-6YPrkmPnJulX7NIoHqZvp_X9O_q5-90P_KI0-jnsEHoR-7ayidwfjw-O5xksU9CZoWQi8wJRK8xxMpYlrVXWHirGVXscqE1OlnXUuYKS1Y55avG2ZICqzqEdhKlR_YURm3XumeQ5lgKq5hVVdNwkbu6KBCt86IKaVPVYAJvB8DMbEmHYUIaQeCaHlxD4Joe3AQOCNHVOiKy7ge6-aWJ58JgxVBgEFlqxYOUqFyOjWNOOE8tiBPYGfRh4um6Mn_2QgKvV9PhXFCxo25dd01rgmsWjPMyge2l-laSBLNTUhyTgNpQ7IaomzPtj-8997YkIKR6_n-xXsGdydnpiTn5MP30Au4SJvQEpsh3YLSYX7vdEMgs8GXcrTfL5vQf
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB5BKyF6QOXZ0AJB4sLB2iSO7eSEWrpLeS0FUak3K5OMS6UqWbbbA_8eT9ZZdhHiavswmhnPfLbH3wC8SnTh0kKjwBwzkaOsRZFjLdJGYe5Ipg3yhf7nqT45yz-cq_NQ_3QdyiqHmNgH6qar-Y585DO_5Gwj05ELZRGnx5M3s5-CO0jxS2top3Ebtk2ulffw7aPx9PTbEJc9clfF8qHSH8vKkWux42LKtCesHJpmhcTU8_f_C3T-XTu5lowmu3AvoMj4cGn2-3CL2gews8Yt-BDG3IqzbwBBTXxMNIu_ikCmehG_m1f8B-vqV3wYOMV59It3Fv6du7byEZxNxt_fnojQM0HUSumFIIXoCvS4GbOscgZTVxeSX-8SVRRIuqq0TgxmsiTjyobqjEFW5WGeRu1QPoattmtpD-IEM1UbWZuyaXKVUJWmiDU5VfojVNlgBK8HhdnZkhrD-iMFK9f2yrWsXNsrN4Ij1uhqHZNa9wPd_MKGPWKxlKjQi6wLb0GdoKEEG5KkyHE74ggOBnvYsNOu7R-_iODlatrvEX74qFrqbniNT9NK5nkWwZOl-VaS-BCUMaaJwGwYdkPUzZn28kfPw61ZEdo8_b9YL-COd1T76f304z7cZZVwNUyaHMDWYn5DzzymWeDz4Ky_ASv_-Fw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Constrained+Deep+Q-Learning+Gradually+Approaching+Ordinary+Q-Learning&rft.jtitle=Frontiers+in+neurorobotics&rft.au=Ohnishi%2C+Shota&rft.au=Uchibe%2C+Eiji&rft.au=Yamaguchi%2C+Yotaro&rft.au=Nakanishi%2C+Kosuke&rft.date=2019-12-10&rft.issn=1662-5218&rft.eissn=1662-5218&rft.volume=13&rft_id=info:doi/10.3389%2Ffnbot.2019.00103&rft.externalDBID=n%2Fa&rft.externalDocID=10_3389_fnbot_2019_00103
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1662-5218&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1662-5218&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1662-5218&client=summon