Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Anothe...
Saved in:
Published in | Mathematical problems in engineering Vol. 2020; no. 2020; pp. 1 - 12 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Cairo, Egypt
Hindawi Publishing Corporation
2020
Hindawi John Wiley & Sons, Inc |
Subjects | |
Online Access | Get full text |
ISSN | 1024-123X 1563-5147 |
DOI | 10.1155/2020/4275623 |
Cover
Loading…
Abstract | Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance. |
---|---|
AbstractList | Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance. |
Author | Wu, Junta Li, Huiyun |
Author_xml | – sequence: 1 fullname: Wu, Junta – sequence: 2 fullname: Li, Huiyun |
BookMark | eNqF0MFLwzAUBvAgE9ymN89S8Kh1L8mStsexzSlMFFHwVtL0bcto05lmjP33dnYgCOIp7_D73iNfj3RsZZGQSwp3lAoxYMBgMGSRkIyfkC4VkoeCDqNOMwMbhpTxjzPSq-s1AKOCxl2SThA3wdTWWGYFBq9o7KJyGku0PpijctbYZbAzfhU8bQtvNg36jkzQoyuNNbU3OnipCqP3wcyp3BySo2JZuSZUnpPThSpqvDi-ffJ-P30bP4Tz59njeDQPNZfgwyhGLlApxpXOgSdZomUEEAtIhNI0pjqBKJdxnAGVWR5LAQIiKiXFPBGM8z65bvduXPW5xdqn62rrbHMyZVw03-WUiUaxVmlX1bXDRaqNV95U1jtlipRCeigyPRSZHotsQre_QhtnSuX2f_Gblq-MzdXO_KevWo2NwYX60TQRIGP-BXAZizs |
CitedBy_id | crossref_primary_10_1007_s13042_020_01218_z crossref_primary_10_1109_TAI_2024_3413692 crossref_primary_10_1155_2022_2557865 crossref_primary_10_1109_TVT_2024_3480996 |
Cites_doi | 10.1016/j.jeconom.2007.01.009 10.1016/j.jspi.2008.04.032 10.1109/msp.2017.2743240 10.1038/nature14539 10.1214/aoms/1177728174 10.1038/nature14236 |
ContentType | Journal Article |
Copyright | Copyright © 2020 Junta Wu and Huiyun Li. Copyright © 2020 Junta Wu and Huiyun Li. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: Copyright © 2020 Junta Wu and Huiyun Li. – notice: Copyright © 2020 Junta Wu and Huiyun Li. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0 |
DBID | ADJCN AHFXO RHU RHW RHX AAYXX CITATION 7TB 8FD 8FE 8FG ABJCF ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU CWDGH DWQXO FR3 GNUQQ HCIFZ JQ2 K7- KR7 L6V M7S P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
DOI | 10.1155/2020/4275623 |
DatabaseName | الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete Hindawi Publishing Complete Hindawi Publishing Subscription Journals Hindawi Publishing Open Access CrossRef Mechanical & Transportation Engineering Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College Middle East & Africa Database ProQuest Central Korea Engineering Research Database ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Civil Engineering Abstracts ProQuest Engineering Collection Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering collection |
DatabaseTitle | CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection Technology Research Database ProQuest One Academic Middle East (New) Mechanical & Transportation Engineering Abstracts ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection Middle East & Africa Database ProQuest Central Korea ProQuest Central (New) Engineering Collection Advanced Technologies & Aerospace Collection Civil Engineering Abstracts Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection Engineering Research Database ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | CrossRef Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: RHX name: Hindawi Publishing Open Access url: http://www.hindawi.com/journals/ sourceTypes: Publisher – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISSN | 1563-5147 |
Editor | Jędrzejowicz, Piotr |
Editor_xml | – sequence: 1 givenname: Piotr surname: Jędrzejowicz fullname: Jędrzejowicz, Piotr |
EndPage | 12 |
ExternalDocumentID | 10_1155_2020_4275623 1195068 |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61672512; 51707191 – fundername: Shenzhen Engineering Laboratory for Autonomous Driving Technology – fundername: CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems – fundername: Shenzhen Institutes of Advanced Technology |
GroupedDBID | -~9 188 24P 29M 2UF 2WC 3V. 4.4 5GY 5VS 8FE 8FG 8R4 8R5 AAFWJ AAJEY ABDBF ABJCF ABUWG ACIPV ACIWK ADBBV ADJCN AENEX AFFNX AFKRA AHFXO AINHJ ALMA_UNASSIGNED_HOLDINGS ARAPS BCNDV BENPR BGLVJ BPHCQ C1A CAG CAHYU CCPQU CNMHZ COF CS3 CWDGH E3Z EBS EJD ESX GROUPED_DOAJ H13 HCIFZ I-F IAO IEA IL9 IOF IPNFZ ISR K6V K7- KQ8 L6V M7S MK~ M~E OK1 P2P P62 PIMPY PQQKQ PROAC PTHSS Q2X REM RHU RHX RIG RNS TR2 TUS UGNYK XSB YQT ~8M ITC RHW 0R~ AAYXX ACCMX CITATION OVT PHGZM PHGZT 7TB 8FD AAMMB AEFGJ AGXDD AIDQK AIDYY AZQEC DWQXO FR3 GNUQQ JQ2 KR7 PKEHL PQEST PQGLB PQUKI PRINS |
ID | FETCH-LOGICAL-c360t-78e35eaa23acd039b9c670085095ac181c907d688b016bd86505071661ed95233 |
IEDL.DBID | RHX |
ISSN | 1024-123X |
IngestDate | Fri Jul 25 10:04:47 EDT 2025 Tue Jul 01 02:13:56 EDT 2025 Thu Apr 24 23:04:43 EDT 2025 Sun Jun 02 18:51:00 EDT 2024 Tue Nov 26 17:05:25 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2020 |
Language | English |
License | This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c360t-78e35eaa23acd039b9c670085095ac181c907d688b016bd86505071661ed95233 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-0157-1393 |
OpenAccessLink | https://dx.doi.org/10.1155/2020/4275623 |
PQID | 2350023125 |
PQPubID | 237775 |
PageCount | 12 |
ParticipantIDs | proquest_journals_2350023125 crossref_citationtrail_10_1155_2020_4275623 crossref_primary_10_1155_2020_4275623 hindawi_primary_10_1155_2020_4275623 emarefa_primary_1195068 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2020-00-00 |
PublicationDateYYYYMMDD | 2020-01-01 |
PublicationDate_xml | – year: 2020 text: 2020-00-00 |
PublicationDecade | 2020 |
PublicationPlace | Cairo, Egypt |
PublicationPlace_xml | – name: Cairo, Egypt – name: New York |
PublicationTitle | Mathematical problems in engineering |
PublicationYear | 2020 |
Publisher | Hindawi Publishing Corporation Hindawi John Wiley & Sons, Inc |
Publisher_xml | – name: Hindawi Publishing Corporation – name: Hindawi – name: John Wiley & Sons, Inc |
References | 23 (24) 2015 (25) 2013 (10) 2015; 8 19 2 4 5 (1) 1998 (22) 1950 (14) 1994 (17) 2017 20 |
References_xml | – year: 2013 ident: 25 – ident: 19 doi: 10.1016/j.jeconom.2007.01.009 – ident: 20 doi: 10.1016/j.jspi.2008.04.032 – ident: 2 doi: 10.1109/msp.2017.2743240 – ident: 4 doi: 10.1038/nature14539 – year: 1998 ident: 1 – year: 1994 ident: 14 – ident: 23 doi: 10.1214/aoms/1177728174 – year: 1950 ident: 22 – ident: 5 doi: 10.1038/nature14236 – year: 2017 ident: 17 publication-title: Advances in Neural Information Processing Systems – volume: 8 start-page: A187 issue: 6 year: 2015 ident: 10 publication-title: Computer Science – year: 2015 ident: 24 |
SSID | ssj0021518 |
Score | 2.3488026 |
Snippet | Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the... |
SourceID | proquest crossref hindawi emarefa |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1 |
SubjectTerms | Algorithms Computer simulation Dynamic programming Efficiency Interactive learning Machine learning Mathematical problems Methods Neural networks Race cars Robot arms Statistical methods Training |
SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT8IwEG8UQuKL3x8omj7gk1nY1nWUJ4MCEhOIIZLwtmxtByYwEGb8973bCsQY9W3Juj7c9e5-d7v-jpBqLJh0tJAWY56yPCeWaHPwJB2hHF8qT-Bt5F7f7w695xEfmYLbyrRVrn1i5qjVXGKNvOYyjvEF4vH94t3CqVH4d9WM0NglRXDBghdI8aHdfxlsUi6IZ_llOBfZ-dho3frOOWb9ds1D9nOXfQtKJT0L4QEiVWmCefHn2w8_nQWfziHZN6iRNnM1H5EdnRyTA4MgqbHP1QkJWlovaDtZ6Vk01XSgM15UmZUAqaFSHVOsvdKe6SSk2Sct0xWT0TbTnCyYPi2zfrCUNqdjEEU6mZ2SYaf9-ti1zAgFSzLfTq260IzrMHRZKJXNGlFD4r0cATCBh6AOR0JyrHwhIoB-kRKA1xAgQtDWqgE5KjsjhWSe6AtCXTtiwgMrjZXvhV4s6iGAH6aF7YPs3ahM7tYyDKThF8cxF9MgyzM4D1DigZF4mdxuVi9yXo1f1p0bdWyX4eRaX5RJ1ajnnw0qa90FxjxXwfYwXf79-ors4WZ5zaVCCunyQ18DCkmjG3PUvgCKhNXq priority: 102 providerName: ProQuest |
Title | Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm |
URI | https://search.emarefa.net/detail/BIM-1195068 https://dx.doi.org/10.1155/2020/4275623 https://www.proquest.com/docview/2350023125 |
Volume | 2020 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uMvDF-2VeRh7mkxTbJu2yx-luKBs6HPSttEk6ha0ba8W_70maTXSIviWQ5OGcnH7fSU--IFRPGOGOZNwihAqLOglXMQct7jDh-FxQpm4jD4Z-f0wfAi8wIknZ5i98QDuVntu3VMmUu6SESrDBVFLeD9Z5FYBWcePNVRJ8JFjVt_-Y-w15KnIWQQPgqPKqkt-Pt42PsUaY7j7aNdQQtwpfHqAtmR6iPUMTsQnC7AiFbSkXuJNmchZPJR5JLX7K9TkfNnqpE6wOWPHAlAtiPaVtSl-0NjMuFIFxb6mLvnLcmk7mS5g0O0bjbuflvm-ZdxIsTnw7txpMEk9GkUsiLmzSjJtcXb5hwAW8CGzucMiAhc9YDPwuFgxImWKBgMxSNCERJSeonM5TeYawa8eEUQjFRPg0oglrRMBwiGS2DxjnxlV0s7JhyI2IuHrLYhrqZMLzQmXx0Fi8iq7XoxeFeMYv406NO76GqedpfVZFdeOePxa4XPkuNDGYhS7xFCMBBnf-v1Uu0I7qFgcsl6icL9_lFVCOPK7Btuv2amj7rjN8GkHv8ZnV9Cb8BMqpzdU |
linkProvider | Hindawi Publishing |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB6qRfTi-1Gfe9CTBJNsErcHEbXVVm0RUegtJrvbKrS12oj4p_yNziQbi4h68raQzRJmZuebmcwDYLstuHS0kBbnnrI8py3pzuFKOkI5gVSeoGrkRjOo3XrnLb9VgPe8FobSKnOdmCpq9SgpRr7ncp_wBfH4cPBk0dQo-ruaj9DIxOJCv72iyzY8qFeQvzuue1q9OalZZqqAJXlgJ9a-0NzXUeTySCqbl-OypFIVgcjpR_iFjkR_UQVCxGgNxUqgCUM2E-KYVmV02zieOwZFNDPKeIuKx9Xm1fWni4f4mRXfudQNkLfyVHvfpyiDvedRt3WXfwHBCd2LcIHIOHFPfvjrwzdcSMHudBamjZXKjjKxmoOC7s_DjLFYmdEHwwUIK1oPWLU_1L24q9m1TvuwyjTkyEzr1g6jWC9rmMxFlr5SMVk4aZtoljUnZmfPaf5Zwo66HSR9ct9bhNt_Ie4SjPcf-3oFmGvHXHioFdoq8CKvLfYjNLa4FnaAcOvGJdjNaRhK08-cxmp0w9Sv8f2QKB4aipdg53P3IOvj8cO-ZcOO0TaalBuIEmwb9vxxwHrOu9Cog2E4Et7V3x9vwWTtpnEZXtabF2swRQdn8Z51GE-eX_QGWkBJvGnEjsHdf0v6BzC-EGg |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB60onjx_ajPPehJQpNskm4PImpb30VEobeY7G6q0NZqI-Jf89c5k2wUEfXkLZDNEmYm881MZr8B2EoEl44W0uLcU5bnJJK-ObySjlBOIJUn6DTyRSs4vvFO2357BN6KszDUVln4xMxRqwdJNfKKy33CF8TjSmLaIi7rzb3Bo0UTpOhPazFOIzeRM_36gunbcPekjrredt1m4_rw2DITBizJAzu1qkJzX0eRyyOpbF6La5KOrQhEUT_Ct3Uk5o4qECLGyChWAsMZip8Q07SqYQrHcd9RGKsiKooSjB00WpdXH-keYml-EM8lZkDeLtrufZ8qDnbFI-Z1l38BxHHdi_ACUXL8jnLyl_tvGJEBX3MGpkzEyvZzE5uFEd2fg2kTvTLjG4bzENa1HrBGf6h7cVezK51xssqs_MgMjWuHUd2XXZguRpY9UjcdORllNMuJitnRU9aLlrL9bgdFn971FuDmX4S7CKX-Q18vA3PtmAsPPUSiAi_yElGNMPDiWtgBQq8bl2GnkGEoDbc5jdjohlmO4_shSTw0Ei_D9sfqQc7p8cO6JaOOz2U0NTcQZdgy6vljg7VCd6FxDcPw05BXfr-9CRNo4eH5SetsFSZp37z0swal9OlZr2MwlMYbxuoY3P63ob8DBKkUlA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Ensemble+Reinforcement+Learning+with+Multiple+Deep+Deterministic+Policy+Gradient+Algorithm&rft.jtitle=Mathematical+problems+in+engineering&rft.au=Wu%2C+Junta&rft.au=Li%2C+Huiyun&rft.date=2020&rft.pub=Hindawi&rft.issn=1024-123X&rft.eissn=1563-5147&rft.volume=2020&rft_id=info:doi/10.1155%2F2020%2F4275623&rft.externalDocID=10_1155_2020_4275623 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1024-123X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1024-123X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1024-123X&client=summon |