Meta-inverse Reinforcement Learning Method Based on Relative Entropy
Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative...
Saved in:
Published in | Ji suan ji ke xue Vol. 48; no. 9; pp. 257 - 263 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
Editorial office of Computer Science
01.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expe |
---|---|
AbstractList | Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expe |
Author | WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You |
Author_xml | – sequence: 1 fullname: WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You organization: 1 School of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China 2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou, Jiangsu 215009,China 3 Suzhou Key Laboratory of Mobile Network Technology and Application,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China |
BookMark | eNqtjLFOwzAURT0UiQLdO_oHUp4dk8RroQiksiAGNuuRvLRO0-fKtir690SIT0B3ONLR0b0RMw5MQiwVrJRqbHU_pOHwvdIANQAYMxNzBaALVdaf12KRkv8CXVZmmpqLpzfKWHg-U0wk38lzH2JLR-Ist4SRPe_k1OxDJ9eYqJOBp2zE7M8kN5xjOF3uxFWPY6LFH2_F6_Pm4_Gl6AIO7hT9EePFBfTuV4S4cxizb0dyiF1VN_TQGk0GVd_Ylhq0Bm2PFjWW__n1A2HGXjA |
ContentType | Journal Article |
DBID | DOA |
DOI | 10.11896/jsjkx.200700044 |
DatabaseName | DOAJ Directory of Open Access Journals |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EndPage | 263 |
ExternalDocumentID | oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a |
GroupedDBID | -0Y 5XA 5XJ 92H 92I ABJNI ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CUBFJ CW9 GROUPED_DOAJ TCJ TGT U1G U5S |
ID | FETCH-doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a3 |
IEDL.DBID | DOA |
ISSN | 1002-137X |
IngestDate | Wed Aug 27 01:18:00 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 9 |
Language | Chinese |
LinkModel | DirectLink |
MergedId | FETCHMERGED-doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a3 |
OpenAccessLink | https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a |
ParticipantIDs | doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a |
PublicationCentury | 2000 |
PublicationDate | 2021-09-01 |
PublicationDateYYYYMMDD | 2021-09-01 |
PublicationDate_xml | – month: 09 year: 2021 text: 2021-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationTitle | Ji suan ji ke xue |
PublicationYear | 2021 |
Publisher | Editorial office of Computer Science |
Publisher_xml | – name: Editorial office of Computer Science |
SSID | ssib023646461 ssib001164759 ssj0057673 ssib051375750 |
Score | 4.5016003 |
Snippet | Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing... |
SourceID | doaj |
SourceType | Open Website |
StartPage | 257 |
SubjectTerms | inverse reinforcement learning|meta-learning|reward function|relative entropy|gradient decent |
Title | Meta-inverse Reinforcement Learning Method Based on Relative Entropy |
URI | https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a |
Volume | 48 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQJxa-Ed_ywGq1ie3YHim0KkhhQCBli9zEAYqUVhAQ8Ou5s1MRJgZYPcSnU-L3Lvf8jpBTgGDgGaZkUZlUDA2wmI0KzZRUALhaKSfwNnJ6nUzuxFUms86oL9SEBXvgkLi-tSWcp04WInbCRpU2hdPWCGsqa2zsqRFgXqeY8kQAbbK-gRpd0pOOcZqMOASD_bxwZgPpVkGKj6M-uMqWDU1tkv7sZfb07u-s-_bnD3N_j0LjDbLW0kd6FsLeJCufD1tkfTmagbZf6ja5SF1j2WONogtHb5z3Ry38r0DaWqre09RPj6ZDALKSzmsahHFvjo5Qvr742CGX49Ht-YRhJPki-FLk6BTtFyB_eZu__Lf88V3Sq-e12yNUDgrJpeAV1EhCCWcjPR1AfVQohayA75Ph3_c7-I-HHJLVGLUlXut1RHrN86s7BnLQTE_8e_AFE1e37w |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Meta-inverse+Reinforcement+Learning+Method+Based+on+Relative+Entropy&rft.jtitle=Ji+suan+ji+ke+xue&rft.au=WU+Shao-bo%2C+FU+Qi-ming%2C+CHEN+Jian-ping%2C+WU+Hong-jie%2C+LU+You&rft.date=2021-09-01&rft.pub=Editorial+office+of+Computer+Science&rft.issn=1002-137X&rft.volume=48&rft.issue=9&rft.spage=257&rft.epage=263&rft_id=info:doi/10.11896%2Fjsjkx.200700044&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1002-137X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1002-137X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1002-137X&client=summon |