Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Anothe...

Full description

Saved in:
Bibliographic Details
Published inMathematical problems in engineering Vol. 2020; no. 2020; pp. 1 - 12
Main Authors Wu, Junta, Li, Huiyun
Format Journal Article
LanguageEnglish
Published Cairo, Egypt Hindawi Publishing Corporation 2020
Hindawi
John Wiley & Sons, Inc
Subjects
Online AccessGet full text
ISSN1024-123X
1563-5147
DOI10.1155/2020/4275623

Cover

Loading…
Abstract Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance.
AbstractList Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems. Another problem is the sequential and iterative training data with autonomous vehicles subject to the law of causality, which is against the i.i.d. (independent identically distributed) data assumption of the training samples. This usually results in failure of the standard bootstrap when learning an optimal policy. In this paper, we propose a framework of m-out-of-n bootstrapped and aggregated multiple deep deterministic policy gradient to accelerate the training process and increase the performance. Experiment results on the 2D robot arm game show that the reward gained by the aggregated policy is 10%–50% better than those gained by subpolicies. Experiment results on the open racing car simulator (TORCS) demonstrate that the new algorithm can learn successful control policies with less training time by 56.7%. Analysis on convergence is also given from the perspective of probability and statistics. These results verify that the proposed method outperforms the existing algorithms in both efficiency and performance.
Author Wu, Junta
Li, Huiyun
Author_xml – sequence: 1
  fullname: Wu, Junta
– sequence: 2
  fullname: Li, Huiyun
BookMark eNqF0MFLwzAUBvAgE9ymN89S8Kh1L8mStsexzSlMFFHwVtL0bcto05lmjP33dnYgCOIp7_D73iNfj3RsZZGQSwp3lAoxYMBgMGSRkIyfkC4VkoeCDqNOMwMbhpTxjzPSq-s1AKOCxl2SThA3wdTWWGYFBq9o7KJyGku0PpijctbYZbAzfhU8bQtvNg36jkzQoyuNNbU3OnipCqP3wcyp3BySo2JZuSZUnpPThSpqvDi-ffJ-P30bP4Tz59njeDQPNZfgwyhGLlApxpXOgSdZomUEEAtIhNI0pjqBKJdxnAGVWR5LAQIiKiXFPBGM8z65bvduXPW5xdqn62rrbHMyZVw03-WUiUaxVmlX1bXDRaqNV95U1jtlipRCeigyPRSZHotsQre_QhtnSuX2f_Gblq-MzdXO_KevWo2NwYX60TQRIGP-BXAZizs
CitedBy_id crossref_primary_10_1007_s13042_020_01218_z
crossref_primary_10_1109_TAI_2024_3413692
crossref_primary_10_1155_2022_2557865
crossref_primary_10_1109_TVT_2024_3480996
Cites_doi 10.1016/j.jeconom.2007.01.009
10.1016/j.jspi.2008.04.032
10.1109/msp.2017.2743240
10.1038/nature14539
10.1214/aoms/1177728174
10.1038/nature14236
ContentType Journal Article
Copyright Copyright © 2020 Junta Wu and Huiyun Li.
Copyright © 2020 Junta Wu and Huiyun Li. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: Copyright © 2020 Junta Wu and Huiyun Li.
– notice: Copyright © 2020 Junta Wu and Huiyun Li. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0
DBID ADJCN
AHFXO
RHU
RHW
RHX
AAYXX
CITATION
7TB
8FD
8FE
8FG
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CWDGH
DWQXO
FR3
GNUQQ
HCIFZ
JQ2
K7-
KR7
L6V
M7S
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.1155/2020/4275623
DatabaseName الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals
معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete
Hindawi Publishing Complete
Hindawi Publishing Subscription Journals
Hindawi Publishing Open Access
CrossRef
Mechanical & Transportation Engineering Abstracts
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
Middle East & Africa Database
ProQuest Central Korea
Engineering Research Database
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Civil Engineering Abstracts
ProQuest Engineering Collection
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering collection
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
Mechanical & Transportation Engineering Abstracts
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
Middle East & Africa Database
ProQuest Central Korea
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Civil Engineering Abstracts
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
CrossRef
Publicly Available Content Database

Database_xml – sequence: 1
  dbid: RHX
  name: Hindawi Publishing Open Access
  url: http://www.hindawi.com/journals/
  sourceTypes: Publisher
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISSN 1563-5147
Editor Jędrzejowicz, Piotr
Editor_xml – sequence: 1
  givenname: Piotr
  surname: Jędrzejowicz
  fullname: Jędrzejowicz, Piotr
EndPage 12
ExternalDocumentID 10_1155_2020_4275623
1195068
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61672512; 51707191
– fundername: Shenzhen Engineering Laboratory for Autonomous Driving Technology
– fundername: CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems
– fundername: Shenzhen Institutes of Advanced Technology
GroupedDBID -~9
188
24P
29M
2UF
2WC
3V.
4.4
5GY
5VS
8FE
8FG
8R4
8R5
AAFWJ
AAJEY
ABDBF
ABJCF
ABUWG
ACIPV
ACIWK
ADBBV
ADJCN
AENEX
AFFNX
AFKRA
AHFXO
AINHJ
ALMA_UNASSIGNED_HOLDINGS
ARAPS
BCNDV
BENPR
BGLVJ
BPHCQ
C1A
CAG
CAHYU
CCPQU
CNMHZ
COF
CS3
CWDGH
E3Z
EBS
EJD
ESX
GROUPED_DOAJ
H13
HCIFZ
I-F
IAO
IEA
IL9
IOF
IPNFZ
ISR
K6V
K7-
KQ8
L6V
M7S
MK~
M~E
OK1
P2P
P62
PIMPY
PQQKQ
PROAC
PTHSS
Q2X
REM
RHU
RHX
RIG
RNS
TR2
TUS
UGNYK
XSB
YQT
~8M
ITC
RHW
0R~
AAYXX
ACCMX
CITATION
OVT
PHGZM
PHGZT
7TB
8FD
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
AZQEC
DWQXO
FR3
GNUQQ
JQ2
KR7
PKEHL
PQEST
PQGLB
PQUKI
PRINS
ID FETCH-LOGICAL-c360t-78e35eaa23acd039b9c670085095ac181c907d688b016bd86505071661ed95233
IEDL.DBID RHX
ISSN 1024-123X
IngestDate Fri Jul 25 10:04:47 EDT 2025
Tue Jul 01 02:13:56 EDT 2025
Thu Apr 24 23:04:43 EDT 2025
Sun Jun 02 18:51:00 EDT 2024
Tue Nov 26 17:05:25 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2020
Language English
License This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c360t-78e35eaa23acd039b9c670085095ac181c907d688b016bd86505071661ed95233
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-0157-1393
OpenAccessLink https://dx.doi.org/10.1155/2020/4275623
PQID 2350023125
PQPubID 237775
PageCount 12
ParticipantIDs proquest_journals_2350023125
crossref_citationtrail_10_1155_2020_4275623
crossref_primary_10_1155_2020_4275623
hindawi_primary_10_1155_2020_4275623
emarefa_primary_1195068
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-00-00
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 2020-00-00
PublicationDecade 2020
PublicationPlace Cairo, Egypt
PublicationPlace_xml – name: Cairo, Egypt
– name: New York
PublicationTitle Mathematical problems in engineering
PublicationYear 2020
Publisher Hindawi Publishing Corporation
Hindawi
John Wiley & Sons, Inc
Publisher_xml – name: Hindawi Publishing Corporation
– name: Hindawi
– name: John Wiley & Sons, Inc
References 23
(24) 2015
(25) 2013
(10) 2015; 8
19
2
4
5
(1) 1998
(22) 1950
(14) 1994
(17) 2017
20
References_xml – year: 2013
  ident: 25
– ident: 19
  doi: 10.1016/j.jeconom.2007.01.009
– ident: 20
  doi: 10.1016/j.jspi.2008.04.032
– ident: 2
  doi: 10.1109/msp.2017.2743240
– ident: 4
  doi: 10.1038/nature14539
– year: 1998
  ident: 1
– year: 1994
  ident: 14
– ident: 23
  doi: 10.1214/aoms/1177728174
– year: 1950
  ident: 22
– ident: 5
  doi: 10.1038/nature14236
– year: 2017
  ident: 17
  publication-title: Advances in Neural Information Processing Systems
– volume: 8
  start-page: A187
  issue: 6
  year: 2015
  ident: 10
  publication-title: Computer Science
– year: 2015
  ident: 24
SSID ssj0021518
Score 2.3488026
Snippet Deep deterministic policy gradient algorithm operating over continuous space of actions has attracted great attention for reinforcement learning. However, the...
SourceID proquest
crossref
hindawi
emarefa
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithms
Computer simulation
Dynamic programming
Efficiency
Interactive learning
Machine learning
Mathematical problems
Methods
Neural networks
Race cars
Robot arms
Statistical methods
Training
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT8IwEG8UQuKL3x8omj7gk1nY1nWUJ4MCEhOIIZLwtmxtByYwEGb8973bCsQY9W3Juj7c9e5-d7v-jpBqLJh0tJAWY56yPCeWaHPwJB2hHF8qT-Bt5F7f7w695xEfmYLbyrRVrn1i5qjVXGKNvOYyjvEF4vH94t3CqVH4d9WM0NglRXDBghdI8aHdfxlsUi6IZ_llOBfZ-dho3frOOWb9ds1D9nOXfQtKJT0L4QEiVWmCefHn2w8_nQWfziHZN6iRNnM1H5EdnRyTA4MgqbHP1QkJWlovaDtZ6Vk01XSgM15UmZUAqaFSHVOsvdKe6SSk2Sct0xWT0TbTnCyYPi2zfrCUNqdjEEU6mZ2SYaf9-ti1zAgFSzLfTq260IzrMHRZKJXNGlFD4r0cATCBh6AOR0JyrHwhIoB-kRKA1xAgQtDWqgE5KjsjhWSe6AtCXTtiwgMrjZXvhV4s6iGAH6aF7YPs3ahM7tYyDKThF8cxF9MgyzM4D1DigZF4mdxuVi9yXo1f1p0bdWyX4eRaX5RJ1ajnnw0qa90FxjxXwfYwXf79-ors4WZ5zaVCCunyQ18DCkmjG3PUvgCKhNXq
  priority: 102
  providerName: ProQuest
Title Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
URI https://search.emarefa.net/detail/BIM-1195068
https://dx.doi.org/10.1155/2020/4275623
https://www.proquest.com/docview/2350023125
Volume 2020
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uMvDF-2VeRh7mkxTbJu2yx-luKBs6HPSttEk6ha0ba8W_70maTXSIviWQ5OGcnH7fSU--IFRPGOGOZNwihAqLOglXMQct7jDh-FxQpm4jD4Z-f0wfAi8wIknZ5i98QDuVntu3VMmUu6SESrDBVFLeD9Z5FYBWcePNVRJ8JFjVt_-Y-w15KnIWQQPgqPKqkt-Pt42PsUaY7j7aNdQQtwpfHqAtmR6iPUMTsQnC7AiFbSkXuJNmchZPJR5JLX7K9TkfNnqpE6wOWPHAlAtiPaVtSl-0NjMuFIFxb6mLvnLcmk7mS5g0O0bjbuflvm-ZdxIsTnw7txpMEk9GkUsiLmzSjJtcXb5hwAW8CGzucMiAhc9YDPwuFgxImWKBgMxSNCERJSeonM5TeYawa8eEUQjFRPg0oglrRMBwiGS2DxjnxlV0s7JhyI2IuHrLYhrqZMLzQmXx0Fi8iq7XoxeFeMYv406NO76GqedpfVZFdeOePxa4XPkuNDGYhS7xFCMBBnf-v1Uu0I7qFgcsl6icL9_lFVCOPK7Btuv2amj7rjN8GkHv8ZnV9Cb8BMqpzdU
linkProvider Hindawi Publishing
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB6qRfTi-1Gfe9CTBJNsErcHEbXVVm0RUegtJrvbKrS12oj4p_yNziQbi4h68raQzRJmZuebmcwDYLstuHS0kBbnnrI8py3pzuFKOkI5gVSeoGrkRjOo3XrnLb9VgPe8FobSKnOdmCpq9SgpRr7ncp_wBfH4cPBk0dQo-ruaj9DIxOJCv72iyzY8qFeQvzuue1q9OalZZqqAJXlgJ9a-0NzXUeTySCqbl-OypFIVgcjpR_iFjkR_UQVCxGgNxUqgCUM2E-KYVmV02zieOwZFNDPKeIuKx9Xm1fWni4f4mRXfudQNkLfyVHvfpyiDvedRt3WXfwHBCd2LcIHIOHFPfvjrwzdcSMHudBamjZXKjjKxmoOC7s_DjLFYmdEHwwUIK1oPWLU_1L24q9m1TvuwyjTkyEzr1g6jWC9rmMxFlr5SMVk4aZtoljUnZmfPaf5Zwo66HSR9ct9bhNt_Ie4SjPcf-3oFmGvHXHioFdoq8CKvLfYjNLa4FnaAcOvGJdjNaRhK08-cxmp0w9Sv8f2QKB4aipdg53P3IOvj8cO-ZcOO0TaalBuIEmwb9vxxwHrOu9Cog2E4Et7V3x9vwWTtpnEZXtabF2swRQdn8Z51GE-eX_QGWkBJvGnEjsHdf0v6BzC-EGg
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB60onjx_ajPPehJQpNskm4PImpb30VEobeY7G6q0NZqI-Jf89c5k2wUEfXkLZDNEmYm881MZr8B2EoEl44W0uLcU5bnJJK-ObySjlBOIJUn6DTyRSs4vvFO2357BN6KszDUVln4xMxRqwdJNfKKy33CF8TjSmLaIi7rzb3Bo0UTpOhPazFOIzeRM_36gunbcPekjrredt1m4_rw2DITBizJAzu1qkJzX0eRyyOpbF6La5KOrQhEUT_Ct3Uk5o4qECLGyChWAsMZip8Q07SqYQrHcd9RGKsiKooSjB00WpdXH-keYml-EM8lZkDeLtrufZ8qDnbFI-Z1l38BxHHdi_ACUXL8jnLyl_tvGJEBX3MGpkzEyvZzE5uFEd2fg2kTvTLjG4bzENa1HrBGf6h7cVezK51xssqs_MgMjWuHUd2XXZguRpY9UjcdORllNMuJitnRU9aLlrL9bgdFn971FuDmX4S7CKX-Q18vA3PtmAsPPUSiAi_yElGNMPDiWtgBQq8bl2GnkGEoDbc5jdjohlmO4_shSTw0Ei_D9sfqQc7p8cO6JaOOz2U0NTcQZdgy6vljg7VCd6FxDcPw05BXfr-9CRNo4eH5SetsFSZp37z0swal9OlZr2MwlMYbxuoY3P63ob8DBKkUlA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Ensemble+Reinforcement+Learning+with+Multiple+Deep+Deterministic+Policy+Gradient+Algorithm&rft.jtitle=Mathematical+problems+in+engineering&rft.au=Wu%2C+Junta&rft.au=Li%2C+Huiyun&rft.date=2020&rft.pub=Hindawi&rft.issn=1024-123X&rft.eissn=1563-5147&rft.volume=2020&rft_id=info:doi/10.1155%2F2020%2F4275623&rft.externalDocID=10_1155_2020_4275623
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1024-123X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1024-123X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1024-123X&client=summon