Meta-inverse Reinforcement Learning Method Based on Relative Entropy

Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative...

Full description

Saved in:

Bibliographic Details
Published in	Ji suan ji ke xue Vol. 48; no. 9; pp. 257 - 263
Main Author	WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You
Format	Journal Article
Language	Chinese
Published	Editorial office of Computer Science 01.09.2021
Subjects	inverse reinforcement learning\|meta-learning\|reward function\|relative entropy\|gradient decent
Online Access	Get full text

Cover

Loading…

Abstract	Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expe
AbstractList	Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expe
Author	WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You
Author_xml	– sequence: 1 fullname: WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You organization: 1 School of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China 2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou, Jiangsu 215009,China 3 Suzhou Key Laboratory of Mobile Network Technology and Application,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
BookMark	eNqtjLFOwzAURT0UiQLdO_oHUp4dk8RroQiksiAGNuuRvLRO0-fKtir690SIT0B3ONLR0b0RMw5MQiwVrJRqbHU_pOHwvdIANQAYMxNzBaALVdaf12KRkv8CXVZmmpqLpzfKWHg-U0wk38lzH2JLR-Ist4SRPe_k1OxDJ9eYqJOBp2zE7M8kN5xjOF3uxFWPY6LFH2_F6_Pm4_Gl6AIO7hT9EePFBfTuV4S4cxizb0dyiF1VN_TQGk0GVd_Ylhq0Bm2PFjWW__n1A2HGXjA
ContentType	Journal Article
DBID	DOA
DOI	10.11896/jsjkx.200700044
DatabaseName	DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	263
ExternalDocumentID	oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a
GroupedDBID	-0Y 5XA 5XJ 92H 92I ABJNI ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CUBFJ CW9 GROUPED_DOAJ TCJ TGT U1G U5S
ID	FETCH-doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a3
IEDL.DBID	DOA
ISSN	1002-137X
IngestDate	Wed Aug 27 01:18:00 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	9
Language	Chinese
LinkModel	DirectLink
MergedId	FETCHMERGED-doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a3
OpenAccessLink	https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a
ParticipantIDs	doaj_primary_oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a
PublicationCentury	2000
PublicationDate	2021-09-01
PublicationDateYYYYMMDD	2021-09-01
PublicationDate_xml	– month: 09 year: 2021 text: 2021-09-01 day: 01
PublicationDecade	2020
PublicationTitle	Ji suan ji ke xue
PublicationYear	2021
Publisher	Editorial office of Computer Science
Publisher_xml	– name: Editorial office of Computer Science
SSID	ssib023646461 ssib001164759 ssj0057673 ssib051375750
Score	4.5016003
Snippet	Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing...
SourceID	doaj
SourceType	Open Website
StartPage	257
SubjectTerms	inverse reinforcement learning\|meta-learning\|reward function\|relative entropy\|gradient decent
Title	Meta-inverse Reinforcement Learning Method Based on Relative Entropy
URI	https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a
Volume	48
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQJxa-Ed_ywGq1ie3YHim0KkhhQCBli9zEAYqUVhAQ8Ou5s1MRJgZYPcSnU-L3Lvf8jpBTgGDgGaZkUZlUDA2wmI0KzZRUALhaKSfwNnJ6nUzuxFUms86oL9SEBXvgkLi-tSWcp04WInbCRpU2hdPWCGsqa2zsqRFgXqeY8kQAbbK-gRpd0pOOcZqMOASD_bxwZgPpVkGKj6M-uMqWDU1tkv7sZfb07u-s-_bnD3N_j0LjDbLW0kd6FsLeJCufD1tkfTmagbZf6ja5SF1j2WONogtHb5z3Ry38r0DaWqre09RPj6ZDALKSzmsahHFvjo5Qvr742CGX49Ht-YRhJPki-FLk6BTtFyB_eZu__Lf88V3Sq-e12yNUDgrJpeAV1EhCCWcjPR1AfVQohayA75Ph3_c7-I-HHJLVGLUlXut1RHrN86s7BnLQTE_8e_AFE1e37w
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Meta-inverse+Reinforcement+Learning+Method+Based+on+Relative+Entropy&rft.jtitle=Ji+suan+ji+ke+xue&rft.au=WU+Shao-bo%2C+FU+Qi-ming%2C+CHEN+Jian-ping%2C+WU+Hong-jie%2C+LU+You&rft.date=2021-09-01&rft.pub=Editorial+office+of+Computer+Science&rft.issn=1002-137X&rft.volume=48&rft.issue=9&rft.spage=257&rft.epage=263&rft_id=info:doi/10.11896%2Fjsjkx.200700044&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_aad678e5c42e4a1f89ce8a94a9fa9a2a
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1002-137X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1002-137X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1002-137X&client=summon