Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Anothe...

Full description

Saved in:

Bibliographic Details
Published in	Mathematical problems in engineering Vol. 2020; no. 2020; pp. 1 - 12
Main Authors	Wu, Junta, Li, Huiyun
Format	Journal Article
Language	English
Published	Cairo, Egypt Hindawi Publishing Corporation 2020 Hindawi John Wiley & Sons, Inc
Subjects	Algorithms Computer simulation Dynamic programming Efficiency Interactive learning Machine learning Mathematical problems Methods Neural networks Race cars Robot arms Statistical methods Training
Online Access	Get full text
ISSN	1024-123X 1563-5147
DOI	10.1155/2020/4275623

Cover

Loading…

Abstract	Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance.
AbstractList	Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance.
Author	Wu, Junta Li, Huiyun
Author_xml	– sequence: 1 fullname: Wu, Junta – sequence: 2 fullname: Li, Huiyun
BookMark	eNqF0MFLwzAUBvAgE9ymN89S8Kh1L8mStsexzSlMFFHwVtL0bcto05lmjP33dnYgCOIp7_D73iNfj3RsZZGQSwp3lAoxYMBgMGSRkIyfkC4VkoeCDqNOMwMbhpTxjzPSq-s1AKOCxl2SThA3wdTWWGYFBq9o7KJyGku0PpijctbYZbAzfhU8bQtvNg36jkzQoyuNNbU3OnipCqP3wcyp3BySo2JZuSZUnpPThSpqvDi-ffJ-P30bP4Tz59njeDQPNZfgwyhGLlApxpXOgSdZomUEEAtIhNI0pjqBKJdxnAGVWR5LAQIiKiXFPBGM8z65bvduXPW5xdqn62rrbHMyZVw03-WUiUaxVmlX1bXDRaqNV95U1jtlipRCeigyPRSZHotsQre_QhtnSuX2f_Gblq-MzdXO_KevWo2NwYX60TQRIGP-BXAZizs
CitedBy_id	crossref_primary_10_1007_s13042_020_01218_z crossref_primary_10_1109_TAI_2024_3413692 crossref_primary_10_1155_2022_2557865 crossref_primary_10_1109_TVT_2024_3480996
Cites_doi	10.1016/j.jeconom.2007.01.009 10.1016/j.jspi.2008.04.032 10.1109/msp.2017.2743240 10.1038/nature14539 10.1214/aoms/1177728174 10.1038/nature14236
ContentType	Journal Article
Copyright	Copyright © 2020 Junta Wu and Huiyun Li. Copyright © 2020 Junta Wu and Huiyun Li. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: Copyright © 2020 Junta Wu and Huiyun Li. – notice: Copyright © 2020 Junta Wu and Huiyun Li. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0
DBID	ADJCN AHFXO RHU RHW RHX AAYXX CITATION 7TB 8FD 8FE 8FG ABJCF ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU CWDGH DWQXO FR3 GNUQQ HCIFZ JQ2 K7- KR7 L6V M7S P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS
DOI	10.1155/2020/4275623
DatabaseName	الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete Hindawi Publishing Complete Hindawi Publishing Subscription Journals Hindawi Publishing Open Access CrossRef Mechanical & Transportation Engineering Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College Middle East & Africa Database ProQuest Central Korea Engineering Research Database ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Civil Engineering Abstracts ProQuest Engineering Collection Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering collection
DatabaseTitle	CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection Technology Research Database ProQuest One Academic Middle East (New) Mechanical & Transportation Engineering Abstracts ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection Middle East & Africa Database ProQuest Central Korea ProQuest Central (New) Engineering Collection Advanced Technologies & Aerospace Collection Civil Engineering Abstracts Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection Engineering Research Database ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	CrossRef Publicly Available Content Database
Database_xml	– sequence: 1 dbid: RHX name: Hindawi Publishing Open Access url: http://www.hindawi.com/journals/ sourceTypes: Publisher – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISSN	1563-5147
Editor	Jędrzejowicz, Piotr
Editor_xml	– sequence: 1 givenname: Piotr surname: Jędrzejowicz fullname: Jędrzejowicz, Piotr
EndPage	12
ExternalDocumentID	10_1155_2020_4275623 1195068
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 61672512; 51707191 – fundername: Shenzhen Engineering Laboratory for Autonomous Driving Technology – fundername: CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems – fundername: Shenzhen Institutes of Advanced Technology
GroupedDBID	-~9 188 24P 29M 2UF 2WC 3V. 4.4 5GY 5VS 8FE 8FG 8R4 8R5 AAFWJ AAJEY ABDBF ABJCF ABUWG ACIPV ACIWK ADBBV ADJCN AENEX AFFNX AFKRA AHFXO AINHJ ALMA_UNASSIGNED_HOLDINGS ARAPS BCNDV BENPR BGLVJ BPHCQ C1A CAG CAHYU CCPQU CNMHZ COF CS3 CWDGH E3Z EBS EJD ESX GROUPED_DOAJ H13 HCIFZ I-F IAO IEA IL9 IOF IPNFZ ISR K6V K7- KQ8 L6V M7S MK~ M~E OK1 P2P P62 PIMPY PQQKQ PROAC PTHSS Q2X REM RHU RHX RIG RNS TR2 TUS UGNYK XSB YQT ~8M ITC RHW 0R~ AAYXX ACCMX CITATION OVT PHGZM PHGZT 7TB 8FD AAMMB AEFGJ AGXDD AIDQK AIDYY AZQEC DWQXO FR3 GNUQQ JQ2 KR7 PKEHL PQEST PQGLB PQUKI PRINS
ID	FETCH-LOGICAL-c360t-78e35eaa23acd039b9c670085095ac181c907d688b016bd86505071661ed95233
IEDL.DBID	RHX
ISSN	1024-123X
IngestDate	Fri Jul 25 10:04:47 EDT 2025 Tue Jul 01 02:13:56 EDT 2025 Thu Apr 24 23:04:43 EDT 2025 Sun Jun 02 18:51:00 EDT 2024 Tue Nov 26 17:05:25 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	2020
Language	English
License	This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c360t-78e35eaa23acd039b9c670085095ac181c907d688b016bd86505071661ed95233
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0003-0157-1393
OpenAccessLink	https://dx.doi.org/10.1155/2020/4275623
PQID	2350023125
PQPubID	237775
PageCount	12
ParticipantIDs	proquest_journals_2350023125 crossref_citationtrail_10_1155_2020_4275623 crossref_primary_10_1155_2020_4275623 hindawi_primary_10_1155_2020_4275623 emarefa_primary_1195068
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2020-00-00
PublicationDateYYYYMMDD	2020-01-01
PublicationDate_xml	– year: 2020 text: 2020-00-00
PublicationDecade	2020
PublicationPlace	Cairo, Egypt
PublicationPlace_xml	– name: Cairo, Egypt – name: New York
PublicationTitle	Mathematical problems in engineering
PublicationYear	2020
Publisher	Hindawi Publishing Corporation Hindawi John Wiley & Sons, Inc
Publisher_xml	– name: Hindawi Publishing Corporation – name: Hindawi – name: John Wiley & Sons, Inc
References	23 (24) 2015 (25) 2013 (10) 2015; 8 19 2 4 5 (1) 1998 (22) 1950 (14) 1994 (17) 2017 20
References_xml	– year: 2013 ident: 25 – ident: 19 doi: 10.1016/j.jeconom.2007.01.009 – ident: 20 doi: 10.1016/j.jspi.2008.04.032 – ident: 2 doi: 10.1109/msp.2017.2743240 – ident: 4 doi: 10.1038/nature14539 – year: 1998 ident: 1 – year: 1994 ident: 14 – ident: 23 doi: 10.1214/aoms/1177728174 – year: 1950 ident: 22 – ident: 5 doi: 10.1038/nature14236 – year: 2017 ident: 17 publication-title: Advances in Neural Information Processing Systems – volume: 8 start-page: A187 issue: 6 year: 2015 ident: 10 publication-title: Computer Science – year: 2015 ident: 24
SSID	ssj0021518
Score	2.3488026
Snippet	Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the...
SourceID	proquest crossref hindawi emarefa
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	1
SubjectTerms	Algorithms Computer simulation Dynamic programming Efficiency Interactive learning Machine learning Mathematical problems Methods Neural networks Race cars Robot arms Statistical methods Training
SummonAdditionalLinks	– databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT8IwEG8UQuKL3x8omj7gk1nY1nWUJ4MCEhOIIZLwtmxtByYwEGb8973bCsQY9W3Juj7c9e5-d7v-jpBqLJh0tJAWY56yPCeWaHPwJB2hHF8qT-Bt5F7f7w695xEfmYLbyrRVrn1i5qjVXGKNvOYyjvEF4vH94t3CqVH4d9WM0NglRXDBghdI8aHdfxlsUi6IZ_llOBfZ-dho3frOOWb9ds1D9nOXfQtKJT0L4QEiVWmCefHn2w8_nQWfziHZN6iRNnM1H5EdnRyTA4MgqbHP1QkJWlovaDtZ6Vk01XSgM15UmZUAqaFSHVOsvdKe6SSk2Sct0xWT0TbTnCyYPi2zfrCUNqdjEEU6mZ2SYaf9-ti1zAgFSzLfTq260IzrMHRZKJXNGlFD4r0cATCBh6AOR0JyrHwhIoB-kRKA1xAgQtDWqgE5KjsjhWSe6AtCXTtiwgMrjZXvhV4s6iGAH6aF7YPs3ahM7tYyDKThF8cxF9MgyzM4D1DigZF4mdxuVi9yXo1f1p0bdWyX4eRaX5RJ1ajnnw0qa90FxjxXwfYwXf79-ors4WZ5zaVCCunyQ18DCkmjG3PUvgCKhNXq priority: 102 providerName: ProQuest
Title	Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
URI	https://search.emarefa.net/detail/BIM-1195068 https://dx.doi.org/10.1155/2020/4275623 https://www.proquest.com/docview/2350023125
Volume	2020
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uMvDF-2VeRh7mkxTbJu2yx-luKBs6HPSttEk6ha0ba8W_70maTXSIviWQ5OGcnH7fSU--IFRPGOGOZNwihAqLOglXMQct7jDh-FxQpm4jD4Z-f0wfAi8wIknZ5i98QDuVntu3VMmUu6SESrDBVFLeD9Z5FYBWcePNVRJ8JFjVt_-Y-w15KnIWQQPgqPKqkt-Pt42PsUaY7j7aNdQQtwpfHqAtmR6iPUMTsQnC7AiFbSkXuJNmchZPJR5JLX7K9TkfNnqpE6wOWPHAlAtiPaVtSl-0NjMuFIFxb6mLvnLcmk7mS5g0O0bjbuflvm-ZdxIsTnw7txpMEk9GkUsiLmzSjJtcXb5hwAW8CGzucMiAhc9YDPwuFgxImWKBgMxSNCERJSeonM5TeYawa8eEUQjFRPg0oglrRMBwiGS2DxjnxlV0s7JhyI2IuHrLYhrqZMLzQmXx0Fi8iq7XoxeFeMYv406NO76GqedpfVZFdeOePxa4XPkuNDGYhS7xFCMBBnf-v1Uu0I7qFgcsl6icL9_lFVCOPK7Btuv2amj7rjN8GkHv8ZnV9Cb8BMqpzdU
linkProvider	Hindawi Publishing
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB6qRfTi-1Gfe9CTBJNsErcHEbXVVm0RUegtJrvbKrS12oj4p_yNziQbi4h68raQzRJmZuebmcwDYLstuHS0kBbnnrI8py3pzuFKOkI5gVSeoGrkRjOo3XrnLb9VgPe8FobSKnOdmCpq9SgpRr7ncp_wBfH4cPBk0dQo-ruaj9DIxOJCv72iyzY8qFeQvzuue1q9OalZZqqAJXlgJ9a-0NzXUeTySCqbl-OypFIVgcjpR_iFjkR_UQVCxGgNxUqgCUM2E-KYVmV02zieOwZFNDPKeIuKx9Xm1fWni4f4mRXfudQNkLfyVHvfpyiDvedRt3WXfwHBCd2LcIHIOHFPfvjrwzdcSMHudBamjZXKjjKxmoOC7s_DjLFYmdEHwwUIK1oPWLU_1L24q9m1TvuwyjTkyEzr1g6jWC9rmMxFlr5SMVk4aZtoljUnZmfPaf5Zwo66HSR9ct9bhNt_Ie4SjPcf-3oFmGvHXHioFdoq8CKvLfYjNLa4FnaAcOvGJdjNaRhK08-cxmp0w9Sv8f2QKB4aipdg53P3IOvj8cO-ZcOO0TaalBuIEmwb9vxxwHrOu9Cog2E4Et7V3x9vwWTtpnEZXtabF2swRQdn8Z51GE-eX_QGWkBJvGnEjsHdf0v6BzC-EGg
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB60onjx_ajPPehJQpNskm4PImpb30VEobeY7G6q0NZqI-Jf89c5k2wUEfXkLZDNEmYm881MZr8B2EoEl44W0uLcU5bnJJK-ObySjlBOIJUn6DTyRSs4vvFO2357BN6KszDUVln4xMxRqwdJNfKKy33CF8TjSmLaIi7rzb3Bo0UTpOhPazFOIzeRM_36gunbcPekjrredt1m4_rw2DITBizJAzu1qkJzX0eRyyOpbF6La5KOrQhEUT_Ct3Uk5o4qECLGyChWAsMZip8Q07SqYQrHcd9RGKsiKooSjB00WpdXH-keYml-EM8lZkDeLtrufZ8qDnbFI-Z1l38BxHHdi_ACUXL8jnLyl_tvGJEBX3MGpkzEyvZzE5uFEd2fg2kTvTLjG4bzENa1HrBGf6h7cVezK51xssqs_MgMjWuHUd2XXZguRpY9UjcdORllNMuJitnRU9aLlrL9bgdFn971FuDmX4S7CKX-Q18vA3PtmAsPPUSiAi_yElGNMPDiWtgBQq8bl2GnkGEoDbc5jdjohlmO4_shSTw0Ei_D9sfqQc7p8cO6JaOOz2U0NTcQZdgy6vljg7VCd6FxDcPw05BXfr-9CRNo4eH5SetsFSZp37z0swal9OlZr2MwlMYbxuoY3P63ob8DBKkUlA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Ensemble+Reinforcement+Learning+with+Multiple+Deep+Deterministic+Policy+Gradient+Algorithm&rft.jtitle=Mathematical+problems+in+engineering&rft.au=Wu%2C+Junta&rft.au=Li%2C+Huiyun&rft.date=2020&rft.pub=Hindawi&rft.issn=1024-123X&rft.eissn=1563-5147&rft.volume=2020&rft_id=info:doi/10.1155%2F2020%2F4275623&rft.externalDocID=10_1155_2020_4275623
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1024-123X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1024-123X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1024-123X&client=summon