Learning to Teach Reinforcement Learning Agents

In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning and knowledge extraction Vol. 1; no. 1; pp. 21 - 42
Main Authors	Fachantidis, Anestis, Taylor, Matthew, Vlahavas, Ioannis
Format	Journal Article
Language	English
Published	01.12.2019
Online Access	Get full text
ISSN	2504-4990 2504-4990
DOI	10.3390/make1010002

Cover

Abstract	In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
AbstractList	In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
Author	Vlahavas, Ioannis Taylor, Matthew Fachantidis, Anestis
Author_xml	– sequence: 1 givenname: Anestis surname: Fachantidis fullname: Fachantidis, Anestis – sequence: 2 givenname: Matthew surname: Taylor fullname: Taylor, Matthew – sequence: 3 givenname: Ioannis surname: Vlahavas fullname: Vlahavas, Ioannis
BookMark	eNptj01LAzEQhoNUsNae_AN7l7WTj91sjqWoFRaE0p6XJDup0W5Wklz897YoUsTTvMP7zMBzTSZhDEjILYV7zhUsBv2OFCgAsAsyZRWIUigFk7N8ReYpvZ0IqQQFMSWLFnUMPuyLPBZb1Pa12KAPbowWBwy5-O2X--Oabsil04eE8585I7vHh-1qXbYvT8-rZVtaJiGXvetlJblRlTFNz42RWiCzvVFQU5SSSdYYwxhtarDaOKUMNQ2ttePKiarhM3L3_dfGMaWIrvuIftDxs6PQnXS7M90jTf_Q1med_Rhy1P7w780XBdBZuw
CitedBy_id	crossref_primary_10_1145_3447268 crossref_primary_10_1007_s10458_021_09527_5 crossref_primary_10_1109_TCYB_2020_3034424 crossref_primary_10_1109_TG_2021_3113644 crossref_primary_10_1109_THMS_2024_3467370 crossref_primary_10_1007_s12652_021_03489_y crossref_primary_10_3390_jsan9020021 crossref_primary_10_3390_ijerph18042121 crossref_primary_10_3233_AIC_201582 crossref_primary_10_3390_app10020700 crossref_primary_10_1109_ACCESS_2019_2952651 crossref_primary_10_1109_JIOT_2023_3342480 crossref_primary_10_7717_peerj_cs_428 crossref_primary_10_1016_j_engappai_2020_103515 crossref_primary_10_1007_s10458_022_09595_1 crossref_primary_10_3390_educsci10100270 crossref_primary_10_1007_s10458_019_09430_0 crossref_primary_10_1109_TNNLS_2022_3147221 crossref_primary_10_1007_s10472_024_09956_4 crossref_primary_10_1016_j_knosys_2023_111333 crossref_primary_10_2200_S01091ED1V01Y202104AIM049 crossref_primary_10_1002_int_22648 crossref_primary_10_1016_j_patcog_2023_109917 crossref_primary_10_1007_s00521_021_06375_y
Cites_doi	10.1007/11564096_40 10.1007/s13748-012-0026-6 10.1007/BF00992699 10.1007/BF00992698 10.1109/TNN.1998.712192 10.1080/01621459.1955.10501294 10.1007/978-3-642-27645-3_5 10.1109/CEC.2011.5949599 10.1016/B978-1-55860-307-3.50045-9 10.1145/1160633.1160757 10.1609/aaai.v24i1.7529
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.3390/make1010002
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
EISSN	2504-4990
EndPage	42
ExternalDocumentID	10_3390_make1010002
GroupedDBID	AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ
ID	FETCH-LOGICAL-c270t-dfd7573b95bb8d3bb7a4e2cdb9061e772728bb221860cabf99b1b816af39f4583
ISSN	2504-4990
IngestDate	Thu Apr 24 22:57:50 EDT 2025 Tue Jul 01 03:11:06 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c270t-dfd7573b95bb8d3bb7a4e2cdb9061e772728bb221860cabf99b1b816af39f4583
OpenAccessLink	https://www.mdpi.com/2504-4990/1/1/2/pdf?version=1545200511
PageCount	22
ParticipantIDs	crossref_primary_10_3390_make1010002 crossref_citationtrail_10_3390_make1010002
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2019-12-01
PublicationDateYYYYMMDD	2019-12-01
PublicationDate_xml	– month: 12 year: 2019 text: 2019-12-01 day: 01
PublicationDecade	2010
PublicationTitle	Machine learning and knowledge extraction
PublicationYear	2019
References	Dunnett (ref_18) 1955; 50 ref_14 Lin (ref_21) 1992; 8 ref_13 ref_12 ref_11 ref_10 ref_19 ref_17 ref_16 ref_15 Taylor (ref_2) 2009; 10 Veloso (ref_9) 2013; 2 Watkins (ref_7) 1992; 8 ref_25 ref_24 ref_23 ref_22 ref_20 ref_1 Taylor (ref_8) 2007; 8 ref_3 ref_5 ref_4 ref_6
References_xml	– ident: ref_10 doi: 10.1007/11564096_40 – ident: ref_5 – ident: ref_24 – ident: ref_11 – volume: 2 start-page: 13 year: 2013 ident: ref_9 article-title: Learning domain structure through probabilistic policy reuse in reinforcement learning publication-title: Prog. Artif. Intell. doi: 10.1007/s13748-012-0026-6 – ident: ref_16 – volume: 8 start-page: 293 year: 1992 ident: ref_21 article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching publication-title: Mach. Learn. doi: 10.1007/BF00992699 – volume: 8 start-page: 279 year: 1992 ident: ref_7 article-title: Q-learning publication-title: Mach. Learn. doi: 10.1007/BF00992698 – ident: ref_23 – ident: ref_1 doi: 10.1109/TNN.1998.712192 – ident: ref_6 – volume: 10 start-page: 1633 year: 2009 ident: ref_2 article-title: Transfer Learning for Reinforcement Learning Domains: A Survey publication-title: J. Mach. Learn. Res. – volume: 8 start-page: 2125 year: 2007 ident: ref_8 article-title: Transfer Learning via Inter-Task Mappings for Temporal Difference Learning publication-title: J. Mach. Learn. Res. – volume: 50 start-page: 1096 year: 1955 ident: ref_18 article-title: A Multiple Comparison Procedure for Comparing Several Treatments with a Control publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1955.10501294 – ident: ref_25 – ident: ref_4 – ident: ref_3 doi: 10.1007/978-3-642-27645-3_5 – ident: ref_15 – ident: ref_13 – ident: ref_17 – ident: ref_12 doi: 10.1109/CEC.2011.5949599 – ident: ref_22 – ident: ref_14 doi: 10.1016/B978-1-55860-307-3.50045-9 – ident: ref_19 doi: 10.1145/1160633.1160757 – ident: ref_20 doi: 10.1609/aaai.v24i1.7529
SSID	ssj0002794104
Score	2.3673558
Snippet	In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to...
SourceID	crossref
SourceType	Enrichment Source Index Database
StartPage	21
Title	Learning to Teach Reinforcement Learning Agents
Volume	1
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZ4LCwIBIjyUgYmUGgSJ009IgQCRJkAsVW-2AFEaVEVGBj47dzZsZtCB2CJKj-qNF96vtd3x9g-RDopuhCHgGdJmBadKJRaxSFXkgjBVLONHPq96875bXp5n903GNfELqngqPiYySv5D6o4hrgSS_YPyPovxQH8jPjiFRHG668wvnJ-DerfQmmR-LRMJdTCOP0O_fzxgyvZ5BTRnkmi1K5rhCUqegfbIYrssaU8eIAlcYSrJ1X3Lx5SeQ6vkE8M_7qBuJu4G8hH-W5ZYxcjapA05WiIRSNpw8gjKnYWooFkwyh6xpgTqD_eGyccG8esran1XYBzLijj8UU-a5QVJK4n55SLzX87vnxSIZoztL3f2DzPFpM8N-H73ufE95agEIpNZ0l__5a6Sfvbjf0NZaWhddyssOXaXAiOLfarbE4P11jb4RpUo8DgHkzhHvh5i_s6uz07vTk5D-vGF2GR5FEVqlLlWc5BZABdxQFymeJfSoFA7UvnFDvvAiTUTiwqJJRCQAzduCNLLkoKhG-wheFoqDdZkESZjkGmmSDLPwEJUqqMR6VIgCuettiB-4X9oq4KT81JBv0Zj7PF9v3iV1sMZdayrd8t22ZLk9dshy1U4ze9ixpeBXvGM7JnQPsCkklS-g
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+to+Teach+Reinforcement+Learning+Agents&rft.jtitle=Machine+learning+and+knowledge+extraction&rft.au=Fachantidis%2C+Anestis&rft.au=Taylor%2C+Matthew&rft.au=Vlahavas%2C+Ioannis&rft.date=2019-12-01&rft.issn=2504-4990&rft.eissn=2504-4990&rft.volume=1&rft.issue=1&rft.spage=21&rft.epage=42&rft_id=info:doi/10.3390%2Fmake1010002&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_make1010002
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-4990&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-4990&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-4990&client=summon