Meta-inverse Reinforcement Learning Method Based on Relative Entropy

Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative...

Full description

Saved in:
Bibliographic Details
Published inJi suan ji ke xue Vol. 48; no. 9; pp. 257 - 263
Main Author WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You
Format Journal Article
LanguageChinese
Published Editorial office of Computer Science 01.09.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expe
AbstractList Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expe
Author WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You
Author_xml – sequence: 1
  fullname: WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You
  organization: 1 School of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China 2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou, Jiangsu 215009,China 3 Suzhou Key Laboratory of Mobile Network Technology and Application,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
BookMark eNqtjLFOwzAURT0UiQLdO_oHUp4dk8RroQiksiAGNuuRvLRO0-fKtir690SIT0B3ONLR0b0RMw5MQiwVrJRqbHU_pOHwvdIANQAYMxNzBaALVdaf12KRkv8CXVZmmpqLpzfKWHg-U0wk38lzH2JLR-Ist4SRPe_k1OxDJ9eYqJOBp2zE7M8kN5xjOF3uxFWPY6LFH2_F6_Pm4_Gl6AIO7hT9EePFBfTuV4S4cxizb0dyiF1VN_TQGk0GVd_Ylhq0Bm2PFjWW__n1A2HGXjA
ContentType Journal Article
DBID DOA
DOI 10.11896/jsjkx.200700044
DatabaseName DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 263
ExternalDocumentID oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a
GroupedDBID -0Y
5XA
5XJ
92H
92I
ABJNI
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CUBFJ
CW9
GROUPED_DOAJ
TCJ
TGT
U1G
U5S
ID FETCH-doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a3
IEDL.DBID DOA
ISSN 1002-137X
IngestDate Wed Aug 27 01:18:00 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 9
Language Chinese
LinkModel DirectLink
MergedId FETCHMERGED-doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a3
OpenAccessLink https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a
ParticipantIDs doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a
PublicationCentury 2000
PublicationDate 2021-09-01
PublicationDateYYYYMMDD 2021-09-01
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-09-01
  day: 01
PublicationDecade 2020
PublicationTitle Ji suan ji ke xue
PublicationYear 2021
Publisher Editorial office of Computer Science
Publisher_xml – name: Editorial office of Computer Science
SSID ssib023646461
ssib001164759
ssj0057673
ssib051375750
Score 4.5016003
Snippet Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing...
SourceID doaj
SourceType Open Website
StartPage 257
SubjectTerms inverse reinforcement learning|meta-learning|reward function|relative entropy|gradient decent
Title Meta-inverse Reinforcement Learning Method Based on Relative Entropy
URI https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a
Volume 48
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQJxa-Ed_ywGq1ie3YHim0KkhhQCBli9zEAYqUVhAQ8Ou5s1MRJgZYPcSnU-L3Lvf8jpBTgGDgGaZkUZlUDA2wmI0KzZRUALhaKSfwNnJ6nUzuxFUms86oL9SEBXvgkLi-tSWcp04WInbCRpU2hdPWCGsqa2zsqRFgXqeY8kQAbbK-gRpd0pOOcZqMOASD_bxwZgPpVkGKj6M-uMqWDU1tkv7sZfb07u-s-_bnD3N_j0LjDbLW0kd6FsLeJCufD1tkfTmagbZf6ja5SF1j2WONogtHb5z3Ry38r0DaWqre09RPj6ZDALKSzmsahHFvjo5Qvr742CGX49Ht-YRhJPki-FLk6BTtFyB_eZu__Lf88V3Sq-e12yNUDgrJpeAV1EhCCWcjPR1AfVQohayA75Ph3_c7-I-HHJLVGLUlXut1RHrN86s7BnLQTE_8e_AFE1e37w
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Meta-inverse+Reinforcement+Learning+Method+Based+on+Relative+Entropy&rft.jtitle=Ji+suan+ji+ke+xue&rft.au=WU+Shao-bo%2C+FU+Qi-ming%2C+CHEN+Jian-ping%2C+WU+Hong-jie%2C+LU+You&rft.date=2021-09-01&rft.pub=Editorial+office+of+Computer+Science&rft.issn=1002-137X&rft.volume=48&rft.issue=9&rft.spage=257&rft.epage=263&rft_id=info:doi/10.11896%2Fjsjkx.200700044&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1002-137X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1002-137X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1002-137X&client=summon