Learning to Teach Reinforcement Learning Agents
In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice...
Saved in:
Published in | Machine learning and knowledge extraction Vol. 1; no. 1; pp. 21 - 42 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
01.12.2019
|
Online Access | Get full text |
ISSN | 2504-4990 2504-4990 |
DOI | 10.3390/make1010002 |
Cover
Abstract | In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning. |
---|---|
AbstractList | In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning. |
Author | Vlahavas, Ioannis Taylor, Matthew Fachantidis, Anestis |
Author_xml | – sequence: 1 givenname: Anestis surname: Fachantidis fullname: Fachantidis, Anestis – sequence: 2 givenname: Matthew surname: Taylor fullname: Taylor, Matthew – sequence: 3 givenname: Ioannis surname: Vlahavas fullname: Vlahavas, Ioannis |
BookMark | eNptj01LAzEQhoNUsNae_AN7l7WTj91sjqWoFRaE0p6XJDup0W5Wklz897YoUsTTvMP7zMBzTSZhDEjILYV7zhUsBv2OFCgAsAsyZRWIUigFk7N8ReYpvZ0IqQQFMSWLFnUMPuyLPBZb1Pa12KAPbowWBwy5-O2X--Oabsil04eE8585I7vHh-1qXbYvT8-rZVtaJiGXvetlJblRlTFNz42RWiCzvVFQU5SSSdYYwxhtarDaOKUMNQ2ttePKiarhM3L3_dfGMaWIrvuIftDxs6PQnXS7M90jTf_Q1med_Rhy1P7w780XBdBZuw |
CitedBy_id | crossref_primary_10_1145_3447268 crossref_primary_10_1007_s10458_021_09527_5 crossref_primary_10_1109_TCYB_2020_3034424 crossref_primary_10_1109_TG_2021_3113644 crossref_primary_10_1109_THMS_2024_3467370 crossref_primary_10_1007_s12652_021_03489_y crossref_primary_10_3390_jsan9020021 crossref_primary_10_3390_ijerph18042121 crossref_primary_10_3233_AIC_201582 crossref_primary_10_3390_app10020700 crossref_primary_10_1109_ACCESS_2019_2952651 crossref_primary_10_1109_JIOT_2023_3342480 crossref_primary_10_7717_peerj_cs_428 crossref_primary_10_1016_j_engappai_2020_103515 crossref_primary_10_1007_s10458_022_09595_1 crossref_primary_10_3390_educsci10100270 crossref_primary_10_1007_s10458_019_09430_0 crossref_primary_10_1109_TNNLS_2022_3147221 crossref_primary_10_1007_s10472_024_09956_4 crossref_primary_10_1016_j_knosys_2023_111333 crossref_primary_10_2200_S01091ED1V01Y202104AIM049 crossref_primary_10_1002_int_22648 crossref_primary_10_1016_j_patcog_2023_109917 crossref_primary_10_1007_s00521_021_06375_y |
Cites_doi | 10.1007/11564096_40 10.1007/s13748-012-0026-6 10.1007/BF00992699 10.1007/BF00992698 10.1109/TNN.1998.712192 10.1080/01621459.1955.10501294 10.1007/978-3-642-27645-3_5 10.1109/CEC.2011.5949599 10.1016/B978-1-55860-307-3.50045-9 10.1145/1160633.1160757 10.1609/aaai.v24i1.7529 |
ContentType | Journal Article |
DBID | AAYXX CITATION |
DOI | 10.3390/make1010002 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2504-4990 |
EndPage | 42 |
ExternalDocumentID | 10_3390_make1010002 |
GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ |
ID | FETCH-LOGICAL-c270t-dfd7573b95bb8d3bb7a4e2cdb9061e772728bb221860cabf99b1b816af39f4583 |
ISSN | 2504-4990 |
IngestDate | Thu Apr 24 22:57:50 EDT 2025 Tue Jul 01 03:11:06 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | https://creativecommons.org/licenses/by/4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c270t-dfd7573b95bb8d3bb7a4e2cdb9061e772728bb221860cabf99b1b816af39f4583 |
OpenAccessLink | https://www.mdpi.com/2504-4990/1/1/2/pdf?version=1545200511 |
PageCount | 22 |
ParticipantIDs | crossref_primary_10_3390_make1010002 crossref_citationtrail_10_3390_make1010002 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2019-12-01 |
PublicationDateYYYYMMDD | 2019-12-01 |
PublicationDate_xml | – month: 12 year: 2019 text: 2019-12-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | Machine learning and knowledge extraction |
PublicationYear | 2019 |
References | Dunnett (ref_18) 1955; 50 ref_14 Lin (ref_21) 1992; 8 ref_13 ref_12 ref_11 ref_10 ref_19 ref_17 ref_16 ref_15 Taylor (ref_2) 2009; 10 Veloso (ref_9) 2013; 2 Watkins (ref_7) 1992; 8 ref_25 ref_24 ref_23 ref_22 ref_20 ref_1 Taylor (ref_8) 2007; 8 ref_3 ref_5 ref_4 ref_6 |
References_xml | – ident: ref_10 doi: 10.1007/11564096_40 – ident: ref_5 – ident: ref_24 – ident: ref_11 – volume: 2 start-page: 13 year: 2013 ident: ref_9 article-title: Learning domain structure through probabilistic policy reuse in reinforcement learning publication-title: Prog. Artif. Intell. doi: 10.1007/s13748-012-0026-6 – ident: ref_16 – volume: 8 start-page: 293 year: 1992 ident: ref_21 article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching publication-title: Mach. Learn. doi: 10.1007/BF00992699 – volume: 8 start-page: 279 year: 1992 ident: ref_7 article-title: Q-learning publication-title: Mach. Learn. doi: 10.1007/BF00992698 – ident: ref_23 – ident: ref_1 doi: 10.1109/TNN.1998.712192 – ident: ref_6 – volume: 10 start-page: 1633 year: 2009 ident: ref_2 article-title: Transfer Learning for Reinforcement Learning Domains: A Survey publication-title: J. Mach. Learn. Res. – volume: 8 start-page: 2125 year: 2007 ident: ref_8 article-title: Transfer Learning via Inter-Task Mappings for Temporal Difference Learning publication-title: J. Mach. Learn. Res. – volume: 50 start-page: 1096 year: 1955 ident: ref_18 article-title: A Multiple Comparison Procedure for Comparing Several Treatments with a Control publication-title: J. Am. Stat. Assoc. doi: 10.1080/01621459.1955.10501294 – ident: ref_25 – ident: ref_4 – ident: ref_3 doi: 10.1007/978-3-642-27645-3_5 – ident: ref_15 – ident: ref_13 – ident: ref_17 – ident: ref_12 doi: 10.1109/CEC.2011.5949599 – ident: ref_22 – ident: ref_14 doi: 10.1016/B978-1-55860-307-3.50045-9 – ident: ref_19 doi: 10.1145/1160633.1160757 – ident: ref_20 doi: 10.1609/aaai.v24i1.7529 |
SSID | ssj0002794104 |
Score | 2.3673558 |
Snippet | In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to... |
SourceID | crossref |
SourceType | Enrichment Source Index Database |
StartPage | 21 |
Title | Learning to Teach Reinforcement Learning Agents |
Volume | 1 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZ4LCwIBIjyUgYmUGgSJ009IgQCRJkAsVW-2AFEaVEVGBj47dzZsZtCB2CJKj-qNF96vtd3x9g-RDopuhCHgGdJmBadKJRaxSFXkgjBVLONHPq96875bXp5n903GNfELqngqPiYySv5D6o4hrgSS_YPyPovxQH8jPjiFRHG668wvnJ-DerfQmmR-LRMJdTCOP0O_fzxgyvZ5BTRnkmi1K5rhCUqegfbIYrssaU8eIAlcYSrJ1X3Lx5SeQ6vkE8M_7qBuJu4G8hH-W5ZYxcjapA05WiIRSNpw8gjKnYWooFkwyh6xpgTqD_eGyccG8esran1XYBzLijj8UU-a5QVJK4n55SLzX87vnxSIZoztL3f2DzPFpM8N-H73ufE95agEIpNZ0l__5a6Sfvbjf0NZaWhddyssOXaXAiOLfarbE4P11jb4RpUo8DgHkzhHvh5i_s6uz07vTk5D-vGF2GR5FEVqlLlWc5BZABdxQFymeJfSoFA7UvnFDvvAiTUTiwqJJRCQAzduCNLLkoKhG-wheFoqDdZkESZjkGmmSDLPwEJUqqMR6VIgCuettiB-4X9oq4KT81JBv0Zj7PF9v3iV1sMZdayrd8t22ZLk9dshy1U4ze9ixpeBXvGM7JnQPsCkklS-g |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+to+Teach+Reinforcement+Learning+Agents&rft.jtitle=Machine+learning+and+knowledge+extraction&rft.au=Fachantidis%2C+Anestis&rft.au=Taylor%2C+Matthew&rft.au=Vlahavas%2C+Ioannis&rft.date=2019-12-01&rft.issn=2504-4990&rft.eissn=2504-4990&rft.volume=1&rft.issue=1&rft.spage=21&rft.epage=42&rft_id=info:doi/10.3390%2Fmake1010002&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_make1010002 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-4990&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-4990&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-4990&client=summon |