An Active Learning Framework for Efficient Robust Policy Search

Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling l...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Narayanaswami, Sai Kiran, Nandan Sudarsanam, Ravindran, Balaraman
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 21.11.2021
Subjects	Active learning Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Robotics Computer simulation Control tasks Environment models Learning Mathematical models Parameters Performance degradation Policies Robustness Searching Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

Abstract	Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework using Linear Bandits, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning.
AbstractList	Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework using Linear Bandits, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning. Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework using Linear Bandits, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning.
Author	Nandan Sudarsanam Narayanaswami, Sai Kiran Ravindran, Balaraman
Author_xml	– sequence: 1 givenname: Sai surname: Narayanaswami middlename: Kiran fullname: Narayanaswami, Sai Kiran – sequence: 2 fullname: Nandan Sudarsanam – sequence: 3 givenname: Balaraman surname: Ravindran fullname: Ravindran, Balaraman
BackLink	https://doi.org/10.1145/3493700.3493712$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.1901.00117$$DView paper in arXiv
BookMark	eNotj11LwzAYhYMoOOd-gFcGvG5N3nw0vZIxNhUKiu6-pFmimVsy025z_966eXXg8HA4zxU6DzFYhG4oybkSgtzr9ON3OS0JzQmhtDhDA2CMZooDXKJR2y4JISALEIIN0MM44LHp_M7iyuoUfPjAs6TXdh_TF3Yx4alz3ngbOvwWm23b4de48uaA33vcfF6jC6dXrR395xDNZ9P55CmrXh6fJ-Mq0wJk1hipS6YEXxhm6IL0f3RfkQKUNarQlKrGkkIq5kA00jkoCy6NlGCMstyxIbo9zR7t6k3ya50O9Z9lfbTsibsTsUnxe2vbrl7GbQr9pxqohBI4J5L9AvjkVVE
ContentType	Paper Journal Article
Copyright	2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY EPD GOX
DOI	10.48550/arxiv.1901.00117
DatabaseName	ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv Statistics arXiv.org
DatabaseTitle	Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection
DatabaseTitleList	Publicly Available Content Database
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Physics
EISSN	2331-8422
ExternalDocumentID	1901_00117
Genre	Working Paper/Pre-Print
GroupedDBID	8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY EPD GOX
ID	FETCH-LOGICAL-a526-bc6a93854dc3c1d0331ac6a0728ec87a118be07683f25b6ff29746c662cc8e4f3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:48:42 EST 2024 Thu Oct 10 19:29:24 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a526-bc6a93854dc3c1d0331ac6a0728ec87a118be07683f25b6ff29746c662cc8e4f3
OpenAccessLink	https://arxiv.org/abs/1901.00117
PQID	2162924406
PQPubID	2050157
ParticipantIDs	arxiv_primary_1901_00117 proquest_journals_2162924406
PublicationCentury	2000
PublicationDate	20211121
PublicationDateYYYYMMDD	2021-11-21
PublicationDate_xml	– month: 11 year: 2021 text: 20211121 day: 21
PublicationDecade	2020
PublicationPlace	Ithaca
PublicationPlace_xml	– name: Ithaca
PublicationTitle	arXiv.org
PublicationYear	2021
Publisher	Cornell University Library, arXiv.org
Publisher_xml	– name: Cornell University Library, arXiv.org
SSID	ssj0002672553
Score	1.8250793
SecondaryResourceType	preprint
Snippet	Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is... Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is...
SourceID	arxiv proquest
SourceType	Open Access Repository Aggregation Database
SubjectTerms	Active learning Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Robotics Computer simulation Control tasks Environment models Learning Mathematical models Parameters Performance degradation Policies Robustness Searching Statistics - Machine Learning
SummonAdditionalLinks	– databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60RfDmk6pV9uB17b6ySU6lSksRLKVU6C3sK-IlrX2IP9_dTaoHwevuKbvJfPPNfPkG4D7JNbcmtzjNbI6FsAJnVhNMNOXKeZItWPjB-WUix6_ieZEsmoLbppFV7mNiDNR2aUKNvMeoZJ4rePzprz5wmBoVuqvNCI1DaDPPFEgL2o_DyXT2U2VhMvU5M6_bmdG8q6fWX--fDwEHa6NKn5XGpT_BOCLM6ATaU7Vy61M4cNUZHEVhptmcQ39QoUGMSaixQn1Do72gCvmMEw2jCYTHDjRb6t1mi2qrX1QLiS9gPhrOn8a4GXqAVcIk1kaqnGeJsIYbagnnVPklkrLMmSxVng9oF7pnvGSJlmXJPCGQRkpmTOZEyS-hVS0r1wHkGHHcf86KEycMd9rqrDTWEyyiPGukV9CJD16sal-LIpxJ1LilV9Ddn0XRvNOb4vcGrv_fvoFjFpQflGJGu9Darnfu1kP3Vt819_MNyfyX0Q priority: 102 providerName: ProQuest
Title	An Active Learning Framework for Efficient Robust Policy Search
URI	https://www.proquest.com/docview/2162924406 https://arxiv.org/abs/1901.00117
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV27TsMwFL1qy8KCQIBaKJUHVov4EceZUEFJK6QWVBWpW-RXEEuK-kBMfDuOk4oBsXiw7MHPc6587jHAbZxqZk1qcSJtijm3HEurIxxpwpTzQTandYLzbC6mr_xpFa86gA65MGrz9f7Z-APr7V2NVo2dZBe6lNaSrcnzqnmcDFZcbfvfdp5jhqo_V2vAi_wUTlqih8bNypxBx1XncD-u0DjcMKg1Nn1D-UEehTx_RFmwdPBIgBZrvd_uUGPcixpZ8AUs82z5OMXtFwZYxVRgbYRKmYy5NcwQGzFGlK-KEiqdkYny7F67-i2MlTTWoiypp_fCCEGNkY6X7BJ61bpyfUCORo75w6lY5LhhTlstS2N9uBQpHwOSAfTDwIuPxqWiqOckKNaSAQwPc1G0O3RbUCKoj708nl_93_Majmmt4SAEUzKE3m6zdzcehHd6BF2ZT0Zw9JDNXxajsC6-nH1nPwKiiX4
link.rule.ids	228,230,783,787,888,12777,21400,27937,33385,33756,43612,43817
linkProvider	Cornell University
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwELWgFYKNT7VQwAOrafwRJ5mqCjUUaCuEitQt8lcQSxqaFvHzsZ0UBiRWZ8o5uXfv_PwOgJswkVSrRKMo1gliTDMUaxmgQGIqjCXZjLgLztMZH7-yx0W4aBpuVSOr3OZEn6j1UrkeeZ9gTixXsPgzKD-QmxrlTlebERq7oM2oxWp3Uzy9_-mxEB7ZipnWh5neuqsvVl_vn7cOBWubSluT-qU_qdjjS3oI2s-iNKsjsGOKY7DnZZmqOgGDYQGHPiPBxgj1DaZbORW09SYceQsIixzwZSk31RrWRr-wlhGfgnk6mt-NUTPyAImQcCQVFwmNQ6YVVVgHlGJhl4KIxEbFkbBsQBp3dkZzEkqe58TSAa44J0rFhuX0DLSKZWE6ABoSGGp_ZkEDwxQ1Uss4V9rSq0BYzoi7oONfPCtrV4vMxcQr3KIu6G1jkTVfdJX9xv_8_8fXYH88n06yycPs6QIcEKcBwRgR3AOt9WpjLi2Ir-WV36lva2-ZXA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Active+Learning+Framework+for+Efficient+Robust+Policy+Search&rft.jtitle=arXiv.org&rft.au=Narayanaswami%2C+Sai+Kiran&rft.au=Nandan+Sudarsanam&rft.au=Ravindran%2C+Balaraman&rft.date=2021-11-21&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.1901.00117