Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specif...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the International Conference on Automated Planning and Scheduling Vol. 34; pp. 532 - 540
Main Authors Shukla, Yash, Burman, Tanushree, Kulkarni, Abhishek N., Wright, Robert, Velasquez, Alvaro, Sinapov, Jivko
Format Journal Article
LanguageEnglish
Published 30.05.2024
Online AccessGet full text

Cover

Loading…
Abstract Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTLf) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.
AbstractList Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTLf) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.
Author Kulkarni, Abhishek N.
Sinapov, Jivko
Velasquez, Alvaro
Shukla, Yash
Burman, Tanushree
Wright, Robert
Author_xml – sequence: 1
  givenname: Yash
  surname: Shukla
  fullname: Shukla, Yash
– sequence: 2
  givenname: Tanushree
  surname: Burman
  fullname: Burman, Tanushree
– sequence: 3
  givenname: Abhishek N.
  surname: Kulkarni
  fullname: Kulkarni, Abhishek N.
– sequence: 4
  givenname: Robert
  surname: Wright
  fullname: Wright, Robert
– sequence: 5
  givenname: Alvaro
  surname: Velasquez
  fullname: Velasquez, Alvaro
– sequence: 6
  givenname: Jivko
  surname: Sinapov
  fullname: Sinapov, Jivko
BookMark eNqdj01uwjAUhC0EEinlAOx8gaR27fCzRKVVF6wgXVuu8xI9kdiRH0XK7RuiqgdgNd9oNIvviU198MDYSopMrsXuBZ3tKLspjTJTMpd6wpJXpXQqtlpN_1nlc7Ykwm-h9SZf73KVsK9jqId7w88dOKwGvGLwlNY_WELJD723LTpeWLrws227Bn3NqxD5CdAP6aAFf-VHsNHfp309VHpms8o2BMu_XDD58V68faYuBqIIlekitjb2RgpzVzCjghkVzKigHvn8AiDCVLU
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1609/icaps.v34i1.31514
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2334-0843
EndPage 540
ExternalDocumentID 10_1609_icaps_v34i1_31514
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
ID FETCH-crossref_primary_10_1609_icaps_v34i1_315143
ISSN 2334-0835
IngestDate Fri Aug 23 03:35:20 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-crossref_primary_10_1609_icaps_v34i1_315143
ParticipantIDs crossref_primary_10_1609_icaps_v34i1_31514
PublicationCentury 2000
PublicationDate 2024-05-30
PublicationDateYYYYMMDD 2024-05-30
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-30
  day: 30
PublicationDecade 2020
PublicationTitle Proceedings of the International Conference on Automated Planning and Scheduling
PublicationYear 2024
SSID ssib044756953
Score 3.8951452
Snippet Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often...
SourceID crossref
SourceType Aggregation Database
StartPage 532
Title Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT8JAEN2gXrwYjRq_swdPEpC22xKOqBiiQoxCgqdmt90iKQFiqQcP_jJ_nLNfpQgm6KUhGzJsM4_Z2TezbxE6r1k2ddyAgQdc2KAwDv85uxaVWFh1vagSsVDen9Jqe80uueu5vULhK9e1lE5ZOfhYeq7kP16FMfCrOCX7B89mRmEAPoN_4QkehudKPn7QcUteIh8Z9q3UTwch5JE36rL5YocmcfGZitZx3TX5xKVeaiCpQSOx2i_W-0bYyaSrj9nylphmgnkKcXZgUFQd6ul0DBkw_La5C0l1hgIwQtHx3s_4nNc0Hsq09YUmGSF9lb5pPrZDR2kCKMtQd58OYzFLGcrYq2jkj4vt8iLHoDrF81yGTWQZvjILebbjEKGXrercPD-mxJxMzNYEqAq6rmZI1frtKvmnhaXBk8qq4IpJUn53yMAqO5DtkNk6aGr_P5bHrGlRbJfAiC9N-NKEL02soQ0bwpyIr63PholmQknRq0kZ1Oy1dFkdrFwuTCSXGOUynM422tJbE1xXONtBBT7aRV2NMbwUY1hjDAuMYYMxDNjCcxjDBmNYYWwPWbeNznWzZObiT5Tqif_r6zv7aH00HvEDhGnAqxbhtsdJSFzKYTvP7YBYFmM0jCJ6iC5Wt3v0ly8fo80ZoE7Q-vQt5aeQLk7ZmfTKN-xZdxw
link.rule.ids 315,783,787,27936,27937
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Logical+Specifications-guided+Dynamic+Task+Sampling+for+Reinforcement+Learning+Agents&rft.jtitle=Proceedings+of+the+International+Conference+on+Automated+Planning+and+Scheduling&rft.au=Shukla%2C+Yash&rft.au=Burman%2C+Tanushree&rft.au=Kulkarni%2C+Abhishek+N.&rft.au=Wright%2C+Robert&rft.date=2024-05-30&rft.issn=2334-0835&rft.eissn=2334-0843&rft.volume=34&rft.spage=532&rft.epage=540&rft_id=info:doi/10.1609%2Ficaps.v34i1.31514&rft.externalDBID=n%2Fa&rft.externalDocID=10_1609_icaps_v34i1_31514
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2334-0835&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2334-0835&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2334-0835&client=summon