Improving Exploration in Deep Reinforcement Learning for Incomplete Information Competition Environments
The sparse reward problem widely exists in multi-agent deep reinforcement learning, preventing agents from learning optimal actions and resulting in inefficient interactions with the environment. Many efforts have been made to design denser rewards and promote agent exploration. However, existing me...
Saved in:
Published in | IEEE transactions on emerging topics in computational intelligence pp. 1 - 12 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2471-285X 2471-285X |
DOI | 10.1109/TETCI.2025.3555250 |
Cover
Abstract | The sparse reward problem widely exists in multi-agent deep reinforcement learning, preventing agents from learning optimal actions and resulting in inefficient interactions with the environment. Many efforts have been made to design denser rewards and promote agent exploration. However, existing methods only focus on the breadth of action exploration, neglecting the rationality of action exploration in deep reinforcement learning, which leads to inefficient action exploration for agents. To address this issue, in this paper, we propose a novel curiosity-based action exploration method in incomplete information competition game environments, namely IGC, to improve both the breadth and rationality of action exploitation in multi-agent deep reinforcement learning for sparse-reward environments. Particularly, to enhance the capability of action exploration for agents, the distance reward is designed in our IGC method to increase the density of rewards in action exploration, thereby mitigating the sparse reward problem. In addition, by integrating the Intrinsic Curiosity Module (ICM) into DQN, we propose an enhanced ICM-DQN module, which enhances the breadth and rationality of subject action exploration for agents. By doing this, our IGC method can mitigate the randomness of the existing curiosity mechanism and increase the rationality of action exploration of agents, thereby enhancing the efficiency of action exploration. Finally, we evaluate the effectiveness of our IGC method on an incomplete information card game, namely Uno card game. The results demonstrate that our IGC method can achieve both better action exploration efficiency and greater winning-rate in comparison with existing methods. |
---|---|
AbstractList | The sparse reward problem widely exists in multi-agent deep reinforcement learning, preventing agents from learning optimal actions and resulting in inefficient interactions with the environment. Many efforts have been made to design denser rewards and promote agent exploration. However, existing methods only focus on the breadth of action exploration, neglecting the rationality of action exploration in deep reinforcement learning, which leads to inefficient action exploration for agents. To address this issue, in this paper, we propose a novel curiosity-based action exploration method in incomplete information competition game environments, namely IGC, to improve both the breadth and rationality of action exploitation in multi-agent deep reinforcement learning for sparse-reward environments. Particularly, to enhance the capability of action exploration for agents, the distance reward is designed in our IGC method to increase the density of rewards in action exploration, thereby mitigating the sparse reward problem. In addition, by integrating the Intrinsic Curiosity Module (ICM) into DQN, we propose an enhanced ICM-DQN module, which enhances the breadth and rationality of subject action exploration for agents. By doing this, our IGC method can mitigate the randomness of the existing curiosity mechanism and increase the rationality of action exploration of agents, thereby enhancing the efficiency of action exploration. Finally, we evaluate the effectiveness of our IGC method on an incomplete information card game, namely Uno card game. The results demonstrate that our IGC method can achieve both better action exploration efficiency and greater winning-rate in comparison with existing methods. |
Author | Ye, Yuhao Zhang, Hanlin Lin, Jie Li, Shaobo Zhao, Peng |
Author_xml | – sequence: 1 givenname: Jie orcidid: 0000-0003-3476-110X surname: Lin fullname: Lin, Jie email: jielin@mail.xjtu.edu.cn organization: Xi'an Jiaotong University, Xi'an, China – sequence: 2 givenname: Yuhao surname: Ye fullname: Ye, Yuhao email: yeyuhao0607@stu.xjtu.edu.cn organization: Xi'an Jiaotong University, Xi'an, China – sequence: 3 givenname: Shaobo orcidid: 0009-0003-8470-010X surname: Li fullname: Li, Shaobo email: 4122151031@stu.xjtu.edu.cn organization: Xi'an Jiaotong University, Xi'an, China – sequence: 4 givenname: Hanlin orcidid: 0000-0001-8869-6863 surname: Zhang fullname: Zhang, Hanlin email: hanlin@qdu.edu.cn organization: Qingdao University, Qingdao, China – sequence: 5 givenname: Peng orcidid: 0000-0001-7033-9315 surname: Zhao fullname: Zhao, Peng email: p.zhao@mail.xjtu.edu.cn organization: Xi'an Jiaotong University, Xi'an, China |
BookMark | eNpNkM1Kw0AUhQepYK19AXGRF0idv5tMllJrDQQEqeAuTMIdHWlmwiQUfXsnbRdd3cPhfJfDuSUz5x0Scs_oijFaPO42u3W54pTDSgAAB3pF5lzmLOUKPmcX-oYsh-GHUsoLYALknHyXXR_8wbqvZPPb733Qo_UusS55RuyTd7TO-NBih25MKtTBTdFoJaVrfdfvccQoo9GdyHU0cbRHvXEHG7yb2OGOXBu9H3B5vgvy8RJ7v6bV27ZcP1Vpyzkb04KDMY3h2uiM80wVUjfYSFlQJTQ0SsbqBlAICbkwjciNyJECUyZGm1yJBeGnv23wwxDQ1H2wnQ5_NaP1NFd9nKue5qrPc0Xo4QRZRLwAikxmKhf_vHxquw |
CODEN | ITETCU |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TETCI.2025.3555250 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2471-285X |
EndPage | 12 |
ExternalDocumentID | 10_1109_TETCI_2025_3555250 10964687 |
Genre | orig-research |
GrantInformation_xml | – fundername: Open-End Foundation of National Key Laboratory of Air-based Information Perception and Fusion 6A – fundername: Aviation Foundation of National Key Laboratory of Air-based Information Perception and Fusion grantid: ASFC-20240001070002 |
GroupedDBID | 0R~ 97E AAJGR AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG ACGFS AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS IFIPE JAVBF OCL RIA RIE AAYXX CITATION EJD RIG |
ID | FETCH-LOGICAL-c221t-925ffbf2afa6226894abeb449083a5b84029f5e334573fb37f37e0518f894b783 |
IEDL.DBID | RIE |
ISSN | 2471-285X |
IngestDate | Tue Aug 05 12:09:21 EDT 2025 Wed Aug 27 02:03:36 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c221t-925ffbf2afa6226894abeb449083a5b84029f5e334573fb37f37e0518f894b783 |
ORCID | 0009-0003-8470-010X 0000-0001-8869-6863 0000-0003-3476-110X 0000-0001-7033-9315 |
PageCount | 12 |
ParticipantIDs | crossref_primary_10_1109_TETCI_2025_3555250 ieee_primary_10964687 |
PublicationCentury | 2000 |
PublicationDate | 2025-00-00 |
PublicationDateYYYYMMDD | 2025-01-01 |
PublicationDate_xml | – year: 2025 text: 2025-00-00 |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on emerging topics in computational intelligence |
PublicationTitleAbbrev | TETCI |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0002951354 |
Score | 2.2783546 |
Snippet | The sparse reward problem widely exists in multi-agent deep reinforcement learning, preventing agents from learning optimal actions and resulting in... |
SourceID | crossref ieee |
SourceType | Index Database Publisher |
StartPage | 1 |
SubjectTerms | Action exploration Computational intelligence Convergence Deep reinforcement learning Games incomplete information competition environments Neural networks Object recognition Q-learning Reviews reward design Silicon carbide sparse reward Training |
Title | Improving Exploration in Deep Reinforcement Learning for Incomplete Information Competition Environments |
URI | https://ieeexplore.ieee.org/document/10964687 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62Jy8-sGJ9kYM32XU3j30cpbZUwR6khd6WnXSiImyLbC_-eifZbamC4C2EBMJMMvlmMt-EsRtFfiyZOQwIDcuArB8EIGIbgEYylJHUZuH4zs-TZDxTT3M9b8nqnguDiD75DEPX9G_5i6VZu1AZnfA8UUmWdliH9llD1toGVARhBanVhhgT5XfT4XTwSC6g0CHdqu797sfls_Obir9MRodssllGk0PyEa5rCM3XrwqN_17nETtoYSW_b_bBMdvD6oS9bSMGvEm181rg7xV_QFzxF_RVU40PEPK20Oorpy5OVsOlmhOg5i1fyc8ceJDtk7z4cIch12OzEUlhHLQ_KwRGiLgOcqGtBStKWyaEv7JclYCg3COgLDWQ0ydyq1FKpVNpQaZWpkjHN7M0FNJMnrJutazwjHEVL4w0SoFVqEqtybrGqCMwJUiIS-iz243Ii1VTQKPwjkeUF15BhVNQ0Sqoz3pOnDsjG0me_9F_wfbd9CYmcsm69ecarwgl1HDtd8c3aDO8Xw |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF60HvTiAyvW5x68SWKzjzyOUi2ttj1IC72FTDqrIqRF0ou_3tlNWqogeAtLEpaZ3ZlvZuebZexGURxLZg49QsPSI-sHHojAeKCRDGVb6nxm-c7DUdibqKepntZkdceFQURXfIa-fXRn-bN5vrSpMtrhSajCONpmO-T4la7oWuuUiiC0ILVaUWPayd34cdzpUxAotE9-1Z7g_XA_G_epOHfSPWCj1USqKpIPf1mCn3_96tH475kesv0aWPL7aiUcsS0sjtnbOmfAq2I7pwf-XvAHxAV_Qdc3NXcpQl63Wn3lNMTJbthic4LUvGYsuS87Dma7Mi_-uMGRa7JJl6TQ8-q7FbxciKD0EqGNASMyk4WEwOJEZYCg7DGgzDRQ2CcSo1FKpSNpQEZGRkgbODb0KkSxPGGNYl7gKeMqmOUyVwqMQpVpTfY1QN2GPAMJQQYtdrsSebqoWmikLvRoJ6lTUGoVlNYKarGmFefGm5Ukz_4Yv2a7vfFwkA76o-dztmd_VWVILlij_FziJWGGEq7cSvkGb_S_rA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+Exploration+in+Deep+Reinforcement+Learning+for+Incomplete+Information+Competition+Environments&rft.jtitle=IEEE+transactions+on+emerging+topics+in+computational+intelligence&rft.au=Lin%2C+Jie&rft.au=Ye%2C+Yuhao&rft.au=Li%2C+Shaobo&rft.au=Zhang%2C+Hanlin&rft.date=2025&rft.pub=IEEE&rft.eissn=2471-285X&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1109%2FTETCI.2025.3555250&rft.externalDocID=10964687 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2471-285X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2471-285X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2471-285X&client=summon |