Learning to Teach Reinforcement Learning Agents

In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice...

Full description

Saved in:
Bibliographic Details
Published inMachine learning and knowledge extraction Vol. 1; no. 1; pp. 21 - 42
Main Authors Fachantidis, Anestis, Taylor, Matthew, Vlahavas, Ioannis
Format Journal Article
LanguageEnglish
Published 01.12.2019
Online AccessGet full text
ISSN2504-4990
2504-4990
DOI10.3390/make1010002

Cover

Abstract In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
AbstractList In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
Author Vlahavas, Ioannis
Taylor, Matthew
Fachantidis, Anestis
Author_xml – sequence: 1
  givenname: Anestis
  surname: Fachantidis
  fullname: Fachantidis, Anestis
– sequence: 2
  givenname: Matthew
  surname: Taylor
  fullname: Taylor, Matthew
– sequence: 3
  givenname: Ioannis
  surname: Vlahavas
  fullname: Vlahavas, Ioannis
BookMark eNptj01LAzEQhoNUsNae_AN7l7WTj91sjqWoFRaE0p6XJDup0W5Wklz897YoUsTTvMP7zMBzTSZhDEjILYV7zhUsBv2OFCgAsAsyZRWIUigFk7N8ReYpvZ0IqQQFMSWLFnUMPuyLPBZb1Pa12KAPbowWBwy5-O2X--Oabsil04eE8585I7vHh-1qXbYvT8-rZVtaJiGXvetlJblRlTFNz42RWiCzvVFQU5SSSdYYwxhtarDaOKUMNQ2ttePKiarhM3L3_dfGMaWIrvuIftDxs6PQnXS7M90jTf_Q1med_Rhy1P7w780XBdBZuw
CitedBy_id crossref_primary_10_1145_3447268
crossref_primary_10_1007_s10458_021_09527_5
crossref_primary_10_1109_TCYB_2020_3034424
crossref_primary_10_1109_TG_2021_3113644
crossref_primary_10_1109_THMS_2024_3467370
crossref_primary_10_1007_s12652_021_03489_y
crossref_primary_10_3390_jsan9020021
crossref_primary_10_3390_ijerph18042121
crossref_primary_10_3233_AIC_201582
crossref_primary_10_3390_app10020700
crossref_primary_10_1109_ACCESS_2019_2952651
crossref_primary_10_1109_JIOT_2023_3342480
crossref_primary_10_7717_peerj_cs_428
crossref_primary_10_1016_j_engappai_2020_103515
crossref_primary_10_1007_s10458_022_09595_1
crossref_primary_10_3390_educsci10100270
crossref_primary_10_1007_s10458_019_09430_0
crossref_primary_10_1109_TNNLS_2022_3147221
crossref_primary_10_1007_s10472_024_09956_4
crossref_primary_10_1016_j_knosys_2023_111333
crossref_primary_10_2200_S01091ED1V01Y202104AIM049
crossref_primary_10_1002_int_22648
crossref_primary_10_1016_j_patcog_2023_109917
crossref_primary_10_1007_s00521_021_06375_y
Cites_doi 10.1007/11564096_40
10.1007/s13748-012-0026-6
10.1007/BF00992699
10.1007/BF00992698
10.1109/TNN.1998.712192
10.1080/01621459.1955.10501294
10.1007/978-3-642-27645-3_5
10.1109/CEC.2011.5949599
10.1016/B978-1-55860-307-3.50045-9
10.1145/1160633.1160757
10.1609/aaai.v24i1.7529
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.3390/make1010002
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2504-4990
EndPage 42
ExternalDocumentID 10_3390_make1010002
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
ID FETCH-LOGICAL-c270t-dfd7573b95bb8d3bb7a4e2cdb9061e772728bb221860cabf99b1b816af39f4583
ISSN 2504-4990
IngestDate Thu Apr 24 22:57:50 EDT 2025
Tue Jul 01 03:11:06 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c270t-dfd7573b95bb8d3bb7a4e2cdb9061e772728bb221860cabf99b1b816af39f4583
OpenAccessLink https://www.mdpi.com/2504-4990/1/1/2/pdf?version=1545200511
PageCount 22
ParticipantIDs crossref_primary_10_3390_make1010002
crossref_citationtrail_10_3390_make1010002
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-12-01
PublicationDateYYYYMMDD 2019-12-01
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-12-01
  day: 01
PublicationDecade 2010
PublicationTitle Machine learning and knowledge extraction
PublicationYear 2019
References Dunnett (ref_18) 1955; 50
ref_14
Lin (ref_21) 1992; 8
ref_13
ref_12
ref_11
ref_10
ref_19
ref_17
ref_16
ref_15
Taylor (ref_2) 2009; 10
Veloso (ref_9) 2013; 2
Watkins (ref_7) 1992; 8
ref_25
ref_24
ref_23
ref_22
ref_20
ref_1
Taylor (ref_8) 2007; 8
ref_3
ref_5
ref_4
ref_6
References_xml – ident: ref_10
  doi: 10.1007/11564096_40
– ident: ref_5
– ident: ref_24
– ident: ref_11
– volume: 2
  start-page: 13
  year: 2013
  ident: ref_9
  article-title: Learning domain structure through probabilistic policy reuse in reinforcement learning
  publication-title: Prog. Artif. Intell.
  doi: 10.1007/s13748-012-0026-6
– ident: ref_16
– volume: 8
  start-page: 293
  year: 1992
  ident: ref_21
  article-title: Self-improving reactive agents based on reinforcement learning, planning and teaching
  publication-title: Mach. Learn.
  doi: 10.1007/BF00992699
– volume: 8
  start-page: 279
  year: 1992
  ident: ref_7
  article-title: Q-learning
  publication-title: Mach. Learn.
  doi: 10.1007/BF00992698
– ident: ref_23
– ident: ref_1
  doi: 10.1109/TNN.1998.712192
– ident: ref_6
– volume: 10
  start-page: 1633
  year: 2009
  ident: ref_2
  article-title: Transfer Learning for Reinforcement Learning Domains: A Survey
  publication-title: J. Mach. Learn. Res.
– volume: 8
  start-page: 2125
  year: 2007
  ident: ref_8
  article-title: Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
  publication-title: J. Mach. Learn. Res.
– volume: 50
  start-page: 1096
  year: 1955
  ident: ref_18
  article-title: A Multiple Comparison Procedure for Comparing Several Treatments with a Control
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1080/01621459.1955.10501294
– ident: ref_25
– ident: ref_4
– ident: ref_3
  doi: 10.1007/978-3-642-27645-3_5
– ident: ref_15
– ident: ref_13
– ident: ref_17
– ident: ref_12
  doi: 10.1109/CEC.2011.5949599
– ident: ref_22
– ident: ref_14
  doi: 10.1016/B978-1-55860-307-3.50045-9
– ident: ref_19
  doi: 10.1145/1160633.1160757
– ident: ref_20
  doi: 10.1609/aaai.v24i1.7529
SSID ssj0002794104
Score 2.3673558
Snippet In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to...
SourceID crossref
SourceType Enrichment Source
Index Database
StartPage 21
Title Learning to Teach Reinforcement Learning Agents
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZ4LCwIBIjyUgYmUGgSJ009IgQCRJkAsVW-2AFEaVEVGBj47dzZsZtCB2CJKj-qNF96vtd3x9g-RDopuhCHgGdJmBadKJRaxSFXkgjBVLONHPq96875bXp5n903GNfELqngqPiYySv5D6o4hrgSS_YPyPovxQH8jPjiFRHG668wvnJ-DerfQmmR-LRMJdTCOP0O_fzxgyvZ5BTRnkmi1K5rhCUqegfbIYrssaU8eIAlcYSrJ1X3Lx5SeQ6vkE8M_7qBuJu4G8hH-W5ZYxcjapA05WiIRSNpw8gjKnYWooFkwyh6xpgTqD_eGyccG8esran1XYBzLijj8UU-a5QVJK4n55SLzX87vnxSIZoztL3f2DzPFpM8N-H73ufE95agEIpNZ0l__5a6Sfvbjf0NZaWhddyssOXaXAiOLfarbE4P11jb4RpUo8DgHkzhHvh5i_s6uz07vTk5D-vGF2GR5FEVqlLlWc5BZABdxQFymeJfSoFA7UvnFDvvAiTUTiwqJJRCQAzduCNLLkoKhG-wheFoqDdZkESZjkGmmSDLPwEJUqqMR6VIgCuettiB-4X9oq4KT81JBv0Zj7PF9v3iV1sMZdayrd8t22ZLk9dshy1U4ze9ixpeBXvGM7JnQPsCkklS-g
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+to+Teach+Reinforcement+Learning+Agents&rft.jtitle=Machine+learning+and+knowledge+extraction&rft.au=Fachantidis%2C+Anestis&rft.au=Taylor%2C+Matthew&rft.au=Vlahavas%2C+Ioannis&rft.date=2019-12-01&rft.issn=2504-4990&rft.eissn=2504-4990&rft.volume=1&rft.issue=1&rft.spage=21&rft.epage=42&rft_id=info:doi/10.3390%2Fmake1010002&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_make1010002
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-4990&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-4990&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-4990&client=summon