Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optim...
Saved in:
Published in | Frontiers in neurorobotics Vol. 13; p. 103 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Switzerland
Frontiers Research Foundation
10.12.2019
Frontiers Media S.A |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods. |
---|---|
AbstractList | A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we propose Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional component to those methods. A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods. A deep Q network (DQN) (Mnih et al., 2013 ) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods. |
Author | Yasui, Yuji Yamaguchi, Yotaro Ohnishi, Shota Ishii, Shin Uchibe, Eiji Nakanishi, Kosuke |
AuthorAffiliation | 3 Department of Systems Science, Graduate School of Informatics, Kyoto University , Kyoto , Japan 4 Honda R&D Co., Ltd. , Saitama , Japan 1 Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd. , Kyoto , Japan 2 ATR Computational Neuroscience Laboratories , Kyoto , Japan |
AuthorAffiliation_xml | – name: 3 Department of Systems Science, Graduate School of Informatics, Kyoto University , Kyoto , Japan – name: 2 ATR Computational Neuroscience Laboratories , Kyoto , Japan – name: 1 Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd. , Kyoto , Japan – name: 4 Honda R&D Co., Ltd. , Saitama , Japan |
Author_xml | – sequence: 1 givenname: Shota surname: Ohnishi fullname: Ohnishi, Shota organization: Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd., Kyoto, Japan – sequence: 2 givenname: Eiji surname: Uchibe fullname: Uchibe, Eiji organization: ATR Computational Neuroscience Laboratories, Kyoto, Japan – sequence: 3 givenname: Yotaro surname: Yamaguchi fullname: Yamaguchi, Yotaro organization: Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan – sequence: 4 givenname: Kosuke surname: Nakanishi fullname: Nakanishi, Kosuke organization: Honda R&D Co., Ltd., Saitama, Japan – sequence: 5 givenname: Yuji surname: Yasui fullname: Yasui, Yuji organization: Honda R&D Co., Ltd., Saitama, Japan – sequence: 6 givenname: Shin surname: Ishii fullname: Ishii, Shin organization: Department of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, Japan |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31920613$$D View this record in MEDLINE/PubMed |
BookMark | eNpdkU1rGzEQhkVJaT7ae0_F0Esv60qalVa6FIKbpgFDKLRnIe2OnDVraSvtFvLvK9tpcIoOI2beeTSa95KchRiQkPeMLgGU_uyDi9OSU6aXlDIKr8gFk5JXgjN1dnI_J5c5bymVXAr1hpwD05xKBhfkZhVDnpLtA3aLr4jj4ke1RptCHzaL22S72Q7D4-J6HFO07cM-e5-6Ptj0eKJ8S157O2R89xSvyK9vNz9X36v1_e3d6npdtULIqULhnFdOKe44t75xzLcKJOiGCqUcSmulpI3joLHxusOWUy2Y5cCkk97BFbk7crtot2ZM_a7MYaLtzSER08bYNPXtgMZpcMKVJ6Rq6kJ1DVLXIaBAz6hoCuvLkTXOboddi6GsYXgBfVkJ_YPZxD9GalYruQd8egKk-HvGPJldn1scBhswztlwAMkF1DUv0o__SbdxTqGsqqjKqZkAVlT0qGpTzDmhfx6GUbP32xz8Nnu_zcHv0vLh9BPPDf8Mhr8WV6hm |
CitedBy_id | crossref_primary_10_3390_en16041608 crossref_primary_10_1016_j_enconman_2022_116179 crossref_primary_10_1109_ACCESS_2022_3195530 crossref_primary_10_3390_sym13122411 crossref_primary_10_3389_fnbot_2020_00063 crossref_primary_10_1002_cjce_24508 crossref_primary_10_1007_s11276_022_02926_w crossref_primary_10_3390_s22187004 crossref_primary_10_1016_j_ijhydene_2022_06_088 crossref_primary_10_1177_02783649231185165 crossref_primary_10_3389_fnbot_2023_1093718 crossref_primary_10_1155_2022_1478048 crossref_primary_10_1002_rnc_6457 crossref_primary_10_1049_stg2_12133 crossref_primary_10_3390_jsan11030045 crossref_primary_10_3390_en15207467 crossref_primary_10_3103_S105261882107013X crossref_primary_10_1007_s42835_023_01730_6 crossref_primary_10_1016_j_segan_2023_101109 crossref_primary_10_1109_ACCESS_2023_3298602 crossref_primary_10_3390_automation4030013 crossref_primary_10_1016_j_est_2023_106784 crossref_primary_10_1016_j_oceaneng_2024_118638 crossref_primary_10_1016_j_oceaneng_2022_111802 crossref_primary_10_1155_2022_7089057 crossref_primary_10_1007_s13198_021_01219_3 crossref_primary_10_3389_fsurg_2022_863633 |
Cites_doi | 10.1609/aaai.v30i1.10303 10.1007/BF00992698 10.1016/j.neunet.2016.07.013 10.3389/fnbot.2018.00032 10.1038/nature16961 10.1038/nature14236 10.1016/j.robot.2018.11.004 10.1038/nature24270 10.24963/ijcai.2019/379 10.1109/9.580874 |
ContentType | Journal Article |
Copyright | Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. 2019. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii |
Copyright_xml | – notice: Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. – notice: 2019. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii |
DBID | NPM AAYXX CITATION 3V. 7XB 88I 8FE 8FH 8FK ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M2P M7P PIMPY PQEST PQQKQ PQUKI PRINS Q9U 7X8 5PM DOA |
DOI | 10.3389/fnbot.2019.00103 |
DatabaseName | PubMed CrossRef ProQuest Central (Corporate) ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection Biological Sciences ProQuest Science Journals Biological Science Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | PubMed CrossRef Publicly Available Content Database ProQuest Science Journals (Alumni Edition) ProQuest Central Student ProQuest Biological Science Collection ProQuest Central Basic ProQuest Central Essentials ProQuest Science Journals ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest One Academic ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | Publicly Available Content Database PubMed |
Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1662-5218 |
EndPage | 103 |
ExternalDocumentID | oai_doaj_org_article_b93b5b7b16874660b7e0bde3e5ef1057 10_3389_fnbot_2019_00103 31920613 |
Genre | Journal Article |
GeographicLocations | Japan |
GeographicLocations_xml | – name: Japan |
GrantInformation_xml | – fundername: New Energy and Industrial Technology Development Organization – fundername: Japan Society for the Promotion of Science |
GroupedDBID | --- 29H 2WC 53G 5GY 5VS 88I 8FE 8FH 9T4 AAFWJ AAKPC ABUWG ACGFS ACXDI ADBBV ADDVE ADRAZ AEGXH AENEX AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS AOIJS ARCSS AZQEC BAWUL BBNVY BCNDV BENPR BHPHI BPHCQ C1A CCPQU CS3 DIK DWQXO E3Z F5P GNUQQ GROUPED_DOAJ GX1 HCIFZ HYE IAO IEA IHR IPNFZ ISR ITC KQ8 LK8 M2P M48 M7P M~E NPM O5R O5S OK1 PIMPY PQQKQ PROAC RIG RNS RPM TR2 AAYXX CITATION PGMZT 3V. 7XB 8FK PQEST PQUKI PRINS Q9U 7X8 5PM |
ID | FETCH-LOGICAL-c556t-e5bbf8b882b22af7b1fc8363970588be6aa6607b239e7f9dec20951a2316b6fb3 |
IEDL.DBID | RPM |
ISSN | 1662-5218 |
IngestDate | Thu Sep 05 15:47:21 EDT 2024 Tue Sep 17 21:18:35 EDT 2024 Sat Aug 17 02:51:20 EDT 2024 Tue Sep 24 19:10:02 EDT 2024 Fri Aug 23 02:06:50 EDT 2024 Thu May 23 23:47:22 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | constrained reinforcement learning target network deep reinforcement learning learning stabilization regularization deep Q network |
Language | English |
License | Copyright © 2019 Ohnishi, Uchibe, Yamaguchi, Nakanishi, Yasui and Ishii. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c556t-e5bbf8b882b22af7b1fc8363970588be6aa6607b239e7f9dec20951a2316b6fb3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Hong Qiao, University of Chinese Academy of Sciences, China Reviewed by: Jiwen Lu, Tsinghua University, China; David Haim Silver, Independent Researcher, Haifa, Israel; Timothy P. Lillicrap, Google, United States |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914867/ |
PMID | 31920613 |
PQID | 2323241531 |
PQPubID | 4424403 |
PageCount | 1 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_b93b5b7b16874660b7e0bde3e5ef1057 pubmedcentral_primary_oai_pubmedcentral_nih_gov_6914867 proquest_miscellaneous_2336253442 proquest_journals_2323241531 crossref_primary_10_3389_fnbot_2019_00103 pubmed_primary_31920613 |
PublicationCentury | 2000 |
PublicationDate | 2019-12-10 |
PublicationDateYYYYMMDD | 2019-12-10 |
PublicationDate_xml | – month: 12 year: 2019 text: 2019-12-10 day: 10 |
PublicationDecade | 2010 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Lausanne |
PublicationTitle | Frontiers in neurorobotics |
PublicationTitleAlternate | Front Neurorobot |
PublicationYear | 2019 |
Publisher | Frontiers Research Foundation Frontiers Media S.A |
Publisher_xml | – name: Frontiers Research Foundation – name: Frontiers Media S.A |
References | Baird (B5) 1995 Kapturowski (B21) 2019 Ziebart (B48) 2008 Fortunato (B10) 2018 Silver (B36) 2016; 529 Anschel (B3) 2017 Mnih (B28) 2016 Mnih (B30) 2015; 518 Gu (B13) 2016 Kim (B23) 2019 Azar (B4) 2011 Plappert (B31) 2018 Yang (B46) 2019 Wang (B44) 2016 van Hasselt (B42) 2016 Zhang (B47) 2017 Achiam (B1) 2019 Hessel (B18) 2017 Schaul (B35) 2015 Mnih (B29) 2013 Graves (B12) 2013 Horgan (B19) 2018 Pohlen (B32) 2018 Andrychowicz (B2) 2017 Durugkar (B8) 2017 B34 He (B15) 2017 Lin (B27) 1993 Bellemare (B6) 2016 B16 Hernandez-Garcia (B17) 2019 Brockman (B7) 2016 Elfwing (B9) 2016; 84 Karimpanal (B22) 2018; 21 Lillicrap (B26) 2016 van Hasselt (B43) 2019 Silver (B37) 2017; 550 Kozuno (B24) 2019 Sutton (B38) 1998 van Hasselt (B41) 2010 Fujimoto (B11) 2018 Tsurumine (B40) 2019; 112 Watkins (B45) 1992; 8 Levine (B25) 2016 Kahn (B20) 2017 Tsitsiklis (B39) 1997; 42 Haarnoja (B14) 2017 Riedmiller (B33) 2005 |
References_xml | – volume-title: Proceedings of the 5th Inernational Conference on Learning Representation year: 2017 ident: B15 article-title: “Learning to play in a day: faster deep reinforcement learning by optimality tightening,” contributor: fullname: He – volume-title: Proceedings of the 30th AAAI Conference on Artificial Intelligence year: 2016 ident: B6 article-title: “Increasing the action gap: New operators for reinforcement learning,” doi: 10.1609/aaai.v30i1.10303 contributor: fullname: Bellemare – start-page: 3455 volume-title: Proceedings of the 26th International Joint Conference on Artificial Intelligence year: 2017 ident: B47 article-title: “Weighted double Q-learning,” contributor: fullname: Zhang – volume: 8 start-page: 279 year: 1992 ident: B45 article-title: Q-learning publication-title: Mach. Learn. doi: 10.1007/BF00992698 contributor: fullname: Watkins – volume-title: Advances in Neural Information Processing Systems year: 2019 ident: B43 article-title: “When to use parametric models in reinforcement learning?” contributor: fullname: van Hasselt – volume-title: Proceedings of the 4th International Conference on Learning Representations year: 2016 ident: B26 article-title: “Continuous control with deep reinforcement learning,” contributor: fullname: Lillicrap – start-page: 2613 volume-title: Advances in Neural Information Processing Systems, Vol. 23 year: 2010 ident: B41 article-title: “Double Q-learning,” contributor: fullname: van Hasselt – volume: 84 start-page: 17 year: 2016 ident: B9 article-title: From free energy to expected energy: improving energy-based value function approximation in reinforcement learning publication-title: Neural Netw. doi: 10.1016/j.neunet.2016.07.013 contributor: fullname: Elfwing – year: 2019 ident: B17 article-title: Understanding multi-step deep reinforcement learning: a systematic study of the DQN target publication-title: arXiv[Preprint].arXiv:1901.07510. contributor: fullname: Hernandez-Garcia – start-page: 317 volume-title: Proceedings of the 16th European Conference on Machine Learning year: 2005 ident: B33 article-title: “Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method,” contributor: fullname: Riedmiller – start-page: 2094 volume-title: Proceedings of the 30th AAAI Conference on Artificial Intelligence year: 2016 ident: B42 article-title: “Deep reinforcement learning with double Q-learning,” contributor: fullname: van Hasselt – ident: B16 – year: 2018 ident: B32 article-title: Observe and look further: achieving consistent performance on Atari publication-title: arXiv[Preprint].arXiv:1805.11593. contributor: fullname: Pohlen – volume-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence year: 2017 ident: B18 article-title: “Rainbow: combining improvements in deep reinforcement learning,” contributor: fullname: Hessel – start-page: 176 volume-title: Proceedings of the 34th International Conference on Machine Learning year: 2017 ident: B3 article-title: “Averaged-DQN: variance reduction and stabilization for deep reinforcement learning,” contributor: fullname: Anschel – volume: 21 start-page: 32 year: 2018 ident: B22 article-title: Experience replay using transition sequences publication-title: Front. Neurorobot. doi: 10.3389/fnbot.2018.00032 contributor: fullname: Karimpanal – volume-title: Reinforcement Learning for Robots Using Neural Networks. year: 1993 ident: B27 contributor: fullname: Lin – start-page: 2995 volume-title: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics year: 2019 ident: B24 article-title: “Theoretical analysis of efficiency and robustness of softmax and gap-increasing operators in reinforcement learning,” contributor: fullname: Kozuno – year: 2013 ident: B29 article-title: Playing Atari with deep reinforcement learning publication-title: arXiv[Preprint].arXiv:1312.5602. contributor: fullname: Mnih – volume: 529 start-page: 484 year: 2016 ident: B36 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 contributor: fullname: Silver – year: 2019 ident: B46 article-title: A theoretical analysis of deep Q-learning publication-title: arXiv[Preprint].arXiv:1901.00137. contributor: fullname: Yang – start-page: 173 volume-title: Proceedings of International Symposium on Experimental Robotics year: 2016 ident: B25 article-title: “Learning hand-eye coordination for robotic grasping with large-scale data collection,” contributor: fullname: Levine – volume-title: Proceedings of the 4th International Conference on Learning Representations year: 2015 ident: B35 article-title: “Prioritized experience replay,” contributor: fullname: Schaul – start-page: 5048 volume-title: Advances in Neural Information Processing Systems, Vol. 30 year: 2017 ident: B2 article-title: “Hindsight experience replay,” contributor: fullname: Andrychowicz – year: 2016 ident: B7 article-title: OpenAI gym publication-title: arXiv[Preprint].arXiv:1606.01540. contributor: fullname: Brockman – volume-title: Proceedings of the 7th International Conference on Learning Representations year: 2019 ident: B21 article-title: “Recurrent experience replay in distributed reinforcement learning,” contributor: fullname: Kapturowski – volume-title: Proceedings of the 35th International Conference on Machine Learning year: 2018 ident: B11 article-title: “Addressing function approximation error in actor-critic methods,” contributor: fullname: Fujimoto – volume: 518 start-page: 529 year: 2015 ident: B30 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 contributor: fullname: Mnih – start-page: 3342 volume-title: Proceedings of IEEE International Conference on Robotics and Automation year: 2017 ident: B20 article-title: “PLATO: policy learning using adaptive trajectory optimization,” contributor: fullname: Kahn – volume: 112 start-page: 72 year: 2019 ident: B40 article-title: Deep reinforcement learning with smooth policy update: application to robotic cloth manipulation publication-title: Robot. Auton. Syst. doi: 10.1016/j.robot.2018.11.004 contributor: fullname: Tsurumine – start-page: 1352 volume-title: Proceedings of the 34th International Conference on Machine Learning year: 2017 ident: B14 article-title: “Reinforcement learning with deep energy-based policies,” contributor: fullname: Haarnoja – volume-title: Proceedings of the 6th Inernational Conference on Learning Representation year: 2018 ident: B10 article-title: “Noisy networks for exploration,” contributor: fullname: Fortunato – volume-title: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series). year: 1998 ident: B38 contributor: fullname: Sutton – volume-title: Proceedings of the 6th International Conference on Learning Representations year: 2018 ident: B19 article-title: “Distributed prioritized experience replay,” contributor: fullname: Horgan – start-page: 1928 volume-title: Proceedings of the 33rd International Conference on Machine Learning year: 2016 ident: B28 article-title: “Asynchronous methods for deep reinforcement learning,” contributor: fullname: Mnih – volume-title: Proceedings of the 6th Inernational Conference on Learning Representation year: 2018 ident: B31 article-title: “Parameter space noise for exploration,” contributor: fullname: Plappert – volume-title: Proceedings of the 33rd International Conference on Machine Learning year: 2016 ident: B44 article-title: “Dueling network architectures for deep reinforcement learning,” contributor: fullname: Wang – start-page: 30 volume-title: Proceedings of the 12th International Conference on Machine Learning year: 1995 ident: B5 article-title: “Residual algorithms: reinforcement learning with function approximation,” contributor: fullname: Baird – start-page: 1433 volume-title: Proceedings of the 23rd AAAI Conference on Artificial Intelligence year: 2008 ident: B48 article-title: “Maximum entropy inverse reinforcement learning,” contributor: fullname: Ziebart – start-page: 2411 volume-title: Advances in Neural Information Processing Systems, Vol. 24 year: 2011 ident: B4 article-title: “Speedy Qlearning,” contributor: fullname: Azar – start-page: 2829 volume-title: Proceedings of the 33rd International Conference on Machine Learning year: 2016 ident: B13 article-title: “Continuous deep Q-learning with model-based acceleration,” contributor: fullname: Gu – volume-title: Proceedings of the Deep Reinforcement Learning Symposium, NIPS 2017 year: 2017 ident: B8 article-title: “TD learning with constrained gradients,” contributor: fullname: Durugkar – volume: 550 start-page: 354 year: 2017 ident: B37 article-title: Mastering the game of Go without human knowledge publication-title: Nature doi: 10.1038/nature24270 contributor: fullname: Silver – year: 2013 ident: B12 article-title: Generating sequences with recurrent neural networks publication-title: arXiv[Preprint].arXiv:1308.0850. contributor: fullname: Graves – ident: B34 – volume-title: arXiv[Prepront].arXiv:1903.08894. year: 2019 ident: B1 article-title: Towards characterizing divergence in deep Q-learning contributor: fullname: Achiam – volume-title: Proceedings of the 28th International Joint Conference on Artificial Intelligence year: 2019 ident: B23 article-title: “Deepmellow: removing the need for a target network in deep Q-learning,” doi: 10.24963/ijcai.2019/379 contributor: fullname: Kim – volume: 42 start-page: 674 year: 1997 ident: B39 article-title: Analysis of temporal-diffference learning with function approximation publication-title: IEEE Trans. Autom. Control doi: 10.1109/9.580874 contributor: fullname: Tsitsiklis |
SSID | ssj0062658 |
Score | 2.3851576 |
Snippet | A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses... A deep Q network (DQN) (Mnih et al., 2013 ) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function... |
SourceID | doaj pubmedcentral proquest crossref pubmed |
SourceType | Open Website Open Access Repository Aggregation Database Index Database |
StartPage | 103 |
SubjectTerms | Algorithms Artificial intelligence constrained reinforcement learning Deep learning deep Q network deep reinforcement learning Informatics Information processing International conferences Learning learning stabilization Machine learning Neural networks Neuroscience regularization Robots Systems science target network Teaching methods |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwEB4hTvSAgLYQXkolLj1ExPEzR55FlQqqVCRuViYZt5VQFsFy4N_jcbKrXYTUS6-2D5Nv_PgmHn8DcFQaF4QzWKDCqlAo28IpbAvRaVSBpOiQf-j_uDZXt-r7nb5bKPXFOWGDPPAA3DHWEjVaFMZZZUyJlkrsSJKmwDVq0-4r9CyYGvbgyNK1Gy4lYwhWH4ceJ5w4KZI45axA1ngIJa3-9wjm2zzJhYPncgPWR8aYnwyWbsIK9VvwYUFH8CNccNnNVOyBuvyc6CH_WYzCqb_zb48Nv7e6f8lPRv1wbr2JE4Nf4i6M_AS3lxe_zq6KsT5C0WptpgVpxOAwcmSsqiZEjELrJN_Uldo5JNM0ETCLlazJhrqjtmJC1URKZ9AElJ9htZ_0tAN5iZVurWxt3XVKl9QIgdhS0HUMl-oOM_g6A8w_DDIYPoYPDK5P4HoG1ydwMzhlROfjWMA6NUS3-tGt_l9uzWB_5g8_rqonH9mfZMYhRQZf5t1xPfAlR9PT5JnHxCNZS6WqDLYH980tidtNxfwlA7vk2CVTl3v6v3-S5rZhIIzd_R_ftgdrjBYnxYhyH1anj890EKnNFA_TLH4FWob4ig priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB6V9gIHxLuBgoLEhUO0cfyIc6pa2LJCojxEpd6ijDNuK6Fku90e-Pf1ZJ1lFyGutg-jb-yZz57xDMC73FgvrMEMFRaZQukyq9BlotWoPEnRIj_ofzk1szP1-Vyf78Bs_AvDaZWjTRwMdds7fiOfBM8v2dtIMWmQXwHccnI4v864fxTHWWMzjXuwVwjFAdu94-nptx-jVQ68XdtVmDJcyqqJ77DnVEoxlKscW2ZFtzRU7_8X5fw7c3LDFZ08goeRQ6ZHK6U_hh3qnsCDjcqCT2HKjTiH9g_Uph-J5un3LJZSvUg_LRr-gfXrd3oUK4rz6NewVfhv7sbKZ3B2Mv35YZbFjgmZ09osM9KI3mJgzVgUjS9ReGclx-5ybS2SaRpj8hILWVHpq5ZcwRSrCSTPoPEon8Nu13e0D2mOhXaldGXVtkrn1AiB6MjrKlygqhYTeD8CVs9XhTHqcKFgcOsB3JrBrQdwEzhmRNfruKT1MNAvLup4QmqsJGoMIhtbqiAllpRjS5I0eW5GnMDBqI86nrOb-s-uSODtejqcEA57NB31t7wmOGktlSoSeLFS31qSYIAKZjQJlFuK3RJ1e6a7uhyqcBsGwpQv_y_WK7jPOHACjMgPYHe5uKXXgcYs8U3coXfS2PTs priority: 102 providerName: ProQuest – databaseName: Scholars Portal Journals: Open Access dbid: M48 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2hcoED4ptAi4LEhUMg_nYOVVWgpUIqCImVerMyybggVdmybCX67-vxZpddtBeOsa3Ieh5n3mTsNwCva-uj8BYr1CgrjaqrvMauEr1BHUmJHvmH_ukXezLRn8_M2d_r0SOAv7eGdlxPajK7ePvn1_VB2vD7HHEmf_suDjjlY5EiS0-y9OdtqZVmez_Vq5xCYu65WqewlsMv4RdJy61v2HBSWct_GwH99xzlmmM6vg_3RkZZHi5M4AHcouEh3F3TGXwER1yWMxeDoL78SHRZfqtGYdXz8tOs5ftYF9fl4agvzq1fk-HwTd21kY9hcnz0_cNJNdZPqDpj7Lwigxg9Jg6NUrbRoYidV5zJq433SLZtra0dStWQi01PnWTC1SbKZ9FGVE9gZ5gO9AzKGqXpnOpc0_fa1NQKgdhRNE0Kp5oeC3izBCxcLmQyQgovGNyQwQ0MbsjgFvCeEV2NY4Hr3DCdnYdxvwRsFBpMU7be6TRLdFRjT4oMRS5NXMDucj3C0mhCYoeKGYkSBbxadaf9wkmQdqDpFY9JLtsorWUBTxfLt5pJ-hxJ5jcFuI2F3ZjqZs_w80fW5LYMhHXP_wOHF3CHH_hsjKh3YWc-u6K9xHDm-DIb7g1Pt_l3 priority: 102 providerName: Scholars Portal |
Title | Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning |
URI | https://www.ncbi.nlm.nih.gov/pubmed/31920613 https://www.proquest.com/docview/2323241531/abstract/ https://search.proquest.com/docview/2336253442 https://pubmed.ncbi.nlm.nih.gov/PMC6914867 https://doaj.org/article/b93b5b7b16874660b7e0bde3e5ef1057 |
Volume | 13 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV09b9swED0k6dIORb-rNjVUoEsHxaL4qTFJnQQFnKZFA3gTdBSZBEhkw3WG_PvwKMmwi05dOFAUcHh3FB_F4zuAL7kynhmFGQosMoHcZkagzVgjUXjHWYP0Q396rs4uxfeZnO2AHO7CxKR9izcH7e3dQXtzHXMrF3d2POSJjS-mx6pkJBQ33oVdzfmwRe8-v4GgS9OdR4bdVzn2Lc4pZ5JFXcqc6uaEqCu6ggYbS1FU7P8Xzfw7W3Jj-Tl5Ac973pgedva9hB3XvoJnG2qCr2FCxTdjyQfXpN-cW6Q_s14-9So9XdZ06-r2IT3sVcSp90cID7qPuzHyDVyeTH4fn2V9lYTMSqlWmZOI3mBgylgUtdfIvDWczutyaQw6VddK5RoLXjrty8bZgmhVHYidQuWRv4W9dt6695DmWEirudVl0wiZu5oxROu8JLzLBhP4OgBWLToxjCpsIgjnKuJcEc5VxDmBI0J0PY5krGPHfHlV9c6ssOQoMZisjBbBStQux8ZxJ52nAsQJ7A_-qPq59acKHJAT7-Asgc_rx2FW0FFH3br5PY0JC7PkQhQJvOvct7ZkcH8CesuxW6ZuPwmBGJW3-8D78N9vfoSnBBHlw7B8H_ZWy3v3KbCaFY7gydHk_OLXKP4VCO3pjIV2KswoxvcjKgv9Sg |
link.rule.ids | 230,315,733,786,790,870,891,2115,21416,24346,27957,27958,33779,33780,43840,53827,53829,74659 |
linkProvider | National Library of Medicine |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB7B9gAcEM8SaCFIXDhETeJnTqiPLQu0y0Ot1JuVceyChJJluz3w7-vJOssuQlxtH0bf2PPw2N8AvMml9oWWmCHHMuPIbKY52qxoBHLvWNEgXeifTuXknH-8EBfxwu0qPqscbGJvqJvO0h35XvD8jLwNK97NfmXUNYqqq7GFxm3Y4iykKiPYOhhPv3wbbHGI1oVeFidDKlbt-RY7ekBZ9CSVQ6Os6Ix6zv5_BZp_v5dcc0DHD-B-jBzT_aWqH8It1z6Ce2t8go9hTO03-6YPrkmPnJulX7NIoHqZvp_X9O_q5-90P_KI0-jnsEHoR-7ayidwfjw-O5xksU9CZoWQi8wJRK8xxMpYlrVXWHirGVXscqE1OlnXUuYKS1Y55avG2ZICqzqEdhKlR_YURm3XumeQ5lgKq5hVVdNwkbu6KBCt86IKaVPVYAJvB8DMbEmHYUIaQeCaHlxD4Joe3AQOCNHVOiKy7ge6-aWJ58JgxVBgEFlqxYOUqFyOjWNOOE8tiBPYGfRh4um6Mn_2QgKvV9PhXFCxo25dd01rgmsWjPMyge2l-laSBLNTUhyTgNpQ7IaomzPtj-8997YkIKR6_n-xXsGdydnpiTn5MP30Au4SJvQEpsh3YLSYX7vdEMgs8GXcrTfL5vQf |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB5BKyF6QOXZ0AJB4sLB2iSO7eSEWrpLeS0FUak3K5OMS6UqWbbbA_8eT9ZZdhHiavswmhnPfLbH3wC8SnTh0kKjwBwzkaOsRZFjLdJGYe5Ipg3yhf7nqT45yz-cq_NQ_3QdyiqHmNgH6qar-Y585DO_5Gwj05ELZRGnx5M3s5-CO0jxS2top3Ebtk2ulffw7aPx9PTbEJc9clfF8qHSH8vKkWux42LKtCesHJpmhcTU8_f_C3T-XTu5lowmu3AvoMj4cGn2-3CL2gews8Yt-BDG3IqzbwBBTXxMNIu_ikCmehG_m1f8B-vqV3wYOMV59It3Fv6du7byEZxNxt_fnojQM0HUSumFIIXoCvS4GbOscgZTVxeSX-8SVRRIuqq0TgxmsiTjyobqjEFW5WGeRu1QPoattmtpD-IEM1UbWZuyaXKVUJWmiDU5VfojVNlgBK8HhdnZkhrD-iMFK9f2yrWsXNsrN4Ij1uhqHZNa9wPd_MKGPWKxlKjQi6wLb0GdoKEEG5KkyHE74ggOBnvYsNOu7R-_iODlatrvEX74qFrqbniNT9NK5nkWwZOl-VaS-BCUMaaJwGwYdkPUzZn28kfPw61ZEdo8_b9YL-COd1T76f304z7cZZVwNUyaHMDWYn5DzzymWeDz4Ky_ASv_-Fw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Constrained+Deep+Q-Learning+Gradually+Approaching+Ordinary+Q-Learning&rft.jtitle=Frontiers+in+neurorobotics&rft.au=Ohnishi%2C+Shota&rft.au=Uchibe%2C+Eiji&rft.au=Yamaguchi%2C+Yotaro&rft.au=Nakanishi%2C+Kosuke&rft.date=2019-12-10&rft.issn=1662-5218&rft.eissn=1662-5218&rft.volume=13&rft_id=info:doi/10.3389%2Ffnbot.2019.00103&rft.externalDBID=n%2Fa&rft.externalDocID=10_3389_fnbot_2019_00103 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1662-5218&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1662-5218&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1662-5218&client=summon |