An Active Learning Framework for Efficient Robust Policy Search
Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling l...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
21.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework using Linear Bandits, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning. |
---|---|
AbstractList | Robust Policy Search is the problem of learning policies that do not degrade
in performance when subject to unseen environment model parameters. It is
particularly relevant for transferring policies learned in a simulation
environment to the real world. Several existing approaches involve sampling
large batches of trajectories which reflect the differences in various possible
environments, and then selecting some subset of these to learn robust policies,
such as the ones that result in the worst performance. We propose an active
learning based framework, EffAcTS, to selectively choose model parameters for
this purpose so as to collect only as much data as necessary to select such a
subset. We apply this framework using Linear Bandits, and experimentally
validate the gains in sample efficiency and the performance of our approach on
standard continuous control tasks. We also present a Multi-Task Learning
perspective to the problem of Robust Policy Search, and draw connections from
our proposed framework to existing work on Multi-Task Learning. Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework using Linear Bandits, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning. |
Author | Nandan Sudarsanam Narayanaswami, Sai Kiran Ravindran, Balaraman |
Author_xml | – sequence: 1 givenname: Sai surname: Narayanaswami middlename: Kiran fullname: Narayanaswami, Sai Kiran – sequence: 2 fullname: Nandan Sudarsanam – sequence: 3 givenname: Balaraman surname: Ravindran fullname: Ravindran, Balaraman |
BackLink | https://doi.org/10.1145/3493700.3493712$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.1901.00117$$DView paper in arXiv |
BookMark | eNotj11LwzAYhYMoOOd-gFcGvG5N3nw0vZIxNhUKiu6-pFmimVsy025z_966eXXg8HA4zxU6DzFYhG4oybkSgtzr9ON3OS0JzQmhtDhDA2CMZooDXKJR2y4JISALEIIN0MM44LHp_M7iyuoUfPjAs6TXdh_TF3Yx4alz3ngbOvwWm23b4de48uaA33vcfF6jC6dXrR395xDNZ9P55CmrXh6fJ-Mq0wJk1hipS6YEXxhm6IL0f3RfkQKUNarQlKrGkkIq5kA00jkoCy6NlGCMstyxIbo9zR7t6k3ya50O9Z9lfbTsibsTsUnxe2vbrl7GbQr9pxqohBI4J5L9AvjkVVE |
ContentType | Paper Journal Article |
Copyright | 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY EPD GOX |
DOI | 10.48550/arxiv.1901.00117 |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
ExternalDocumentID | 1901_00117 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY EPD GOX |
ID | FETCH-LOGICAL-a526-bc6a93854dc3c1d0331ac6a0728ec87a118be07683f25b6ff29746c662cc8e4f3 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:48:42 EST 2024 Thu Oct 10 19:29:24 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a526-bc6a93854dc3c1d0331ac6a0728ec87a118be07683f25b6ff29746c662cc8e4f3 |
OpenAccessLink | https://arxiv.org/abs/1901.00117 |
PQID | 2162924406 |
PQPubID | 2050157 |
ParticipantIDs | arxiv_primary_1901_00117 proquest_journals_2162924406 |
PublicationCentury | 2000 |
PublicationDate | 20211121 |
PublicationDateYYYYMMDD | 2021-11-21 |
PublicationDate_xml | – month: 11 year: 2021 text: 20211121 day: 21 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2021 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 1.8250793 |
SecondaryResourceType | preprint |
Snippet | Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is... Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is... |
SourceID | arxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Active learning Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Robotics Computer simulation Control tasks Environment models Learning Mathematical models Parameters Performance degradation Policies Robustness Searching Statistics - Machine Learning |
SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60RfDmk6pV9uB17b6ySU6lSksRLKVU6C3sK-IlrX2IP9_dTaoHwevuKbvJfPPNfPkG4D7JNbcmtzjNbI6FsAJnVhNMNOXKeZItWPjB-WUix6_ieZEsmoLbppFV7mNiDNR2aUKNvMeoZJ4rePzprz5wmBoVuqvNCI1DaDPPFEgL2o_DyXT2U2VhMvU5M6_bmdG8q6fWX--fDwEHa6NKn5XGpT_BOCLM6ATaU7Vy61M4cNUZHEVhptmcQ39QoUGMSaixQn1Do72gCvmMEw2jCYTHDjRb6t1mi2qrX1QLiS9gPhrOn8a4GXqAVcIk1kaqnGeJsIYbagnnVPklkrLMmSxVng9oF7pnvGSJlmXJPCGQRkpmTOZEyS-hVS0r1wHkGHHcf86KEycMd9rqrDTWEyyiPGukV9CJD16sal-LIpxJ1LilV9Ddn0XRvNOb4vcGrv_fvoFjFpQflGJGu9Darnfu1kP3Vt819_MNyfyX0Q priority: 102 providerName: ProQuest |
Title | An Active Learning Framework for Efficient Robust Policy Search |
URI | https://www.proquest.com/docview/2162924406 https://arxiv.org/abs/1901.00117 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV27TsMwFL1qy8KCQIBaKJUHVov4EceZUEFJK6QWVBWpW-RXEEuK-kBMfDuOk4oBsXiw7MHPc6587jHAbZxqZk1qcSJtijm3HEurIxxpwpTzQTandYLzbC6mr_xpFa86gA65MGrz9f7Z-APr7V2NVo2dZBe6lNaSrcnzqnmcDFZcbfvfdp5jhqo_V2vAi_wUTlqih8bNypxBx1XncD-u0DjcMKg1Nn1D-UEehTx_RFmwdPBIgBZrvd_uUGPcixpZ8AUs82z5OMXtFwZYxVRgbYRKmYy5NcwQGzFGlK-KEiqdkYny7F67-i2MlTTWoiypp_fCCEGNkY6X7BJ61bpyfUCORo75w6lY5LhhTlstS2N9uBQpHwOSAfTDwIuPxqWiqOckKNaSAQwPc1G0O3RbUCKoj708nl_93_Majmmt4SAEUzKE3m6zdzcehHd6BF2ZT0Zw9JDNXxajsC6-nH1nPwKiiX4 |
link.rule.ids | 228,230,783,787,888,12777,21400,27937,33385,33756,43612,43817 |
linkProvider | Cornell University |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwELWgFYKNT7VQwAOrafwRJ5mqCjUUaCuEitQt8lcQSxqaFvHzsZ0UBiRWZ8o5uXfv_PwOgJswkVSrRKMo1gliTDMUaxmgQGIqjCXZjLgLztMZH7-yx0W4aBpuVSOr3OZEn6j1UrkeeZ9gTixXsPgzKD-QmxrlTlebERq7oM2oxWp3Uzy9_-mxEB7ZipnWh5neuqsvVl_vn7cOBWubSluT-qU_qdjjS3oI2s-iNKsjsGOKY7DnZZmqOgGDYQGHPiPBxgj1DaZbORW09SYceQsIixzwZSk31RrWRr-wlhGfgnk6mt-NUTPyAImQcCQVFwmNQ6YVVVgHlGJhl4KIxEbFkbBsQBp3dkZzEkqe58TSAa44J0rFhuX0DLSKZWE6ABoSGGp_ZkEDwxQ1Uss4V9rSq0BYzoi7oONfPCtrV4vMxcQr3KIu6G1jkTVfdJX9xv_8_8fXYH88n06yycPs6QIcEKcBwRgR3AOt9WpjLi2Ir-WV36lva2-ZXA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Active+Learning+Framework+for+Efficient+Robust+Policy+Search&rft.jtitle=arXiv.org&rft.au=Narayanaswami%2C+Sai+Kiran&rft.au=Nandan+Sudarsanam&rft.au=Ravindran%2C+Balaraman&rft.date=2021-11-21&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.1901.00117 |