Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks
Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adv...
Saved in:
Published in | Proceedings (IEEE Security and Privacy Workshops. Online) pp. 57 - 67 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
23.05.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 2770-8411 |
DOI | 10.1109/SPW63631.2024.00011 |
Cover
Loading…
Abstract | Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then leverage a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state. |
---|---|
AbstractList | Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then leverage a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state. |
Author | Sibai, Hussein Vorobeychik, Yevgeniy Wu, Junlin |
Author_xml | – sequence: 1 givenname: Junlin surname: Wu fullname: Wu, Junlin email: junlin.wu@wustl.edu organization: Washington University in St. Louis,Computer Science & Engineering – sequence: 2 givenname: Hussein surname: Sibai fullname: Sibai, Hussein email: sibai@wustl.edu organization: Washington University in St. Louis,Computer Science & Engineering – sequence: 3 givenname: Yevgeniy surname: Vorobeychik fullname: Vorobeychik, Yevgeniy email: yvorobeychik@wustl.edu organization: Washington University in St. Louis,Computer Science & Engineering |
BookMark | eNotj8FOAjEURavRRES-QBf9gcHXvnbaWRKiaEIiAY1L8qa0pgrFdIoJf-8QXd3NuefmXrOLtE-esVsBYyGguV8t3musUYwlSDUGACHO2KgxjUUNqJU16pwNpDFQWSXEFRt13WePoQQlNAzYcupzieEY0wdfUfDlyGPiSx9T2Gfndz4VPveU0wk4pI3PfLL58bmjHGnLF337kFsqcZ_4pBRyX90Nuwy07fzoP4fs7fHhdfpUzV9mz9PJvIr9eKmoVc61tnYbiw7a1hJaFNg4otOP2inQEgM0hjBIQYRSaqMgSIVtCAGH7O7PG7336-8cd5SPawHaNBos_gIBYlOO |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/SPW63631.2024.00011 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9798350354874 |
EISSN | 2770-8411 |
EndPage | 67 |
ExternalDocumentID | 10579508 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
ID | FETCH-LOGICAL-i204t-ab4ccb86cd83c0bb8a383139caa00016c40523f097a3f21aa3225740f243bfff3 |
IEDL.DBID | RIE |
IngestDate | Wed Jun 04 06:02:01 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i204t-ab4ccb86cd83c0bb8a383139caa00016c40523f097a3f21aa3225740f243bfff3 |
PageCount | 11 |
ParticipantIDs | ieee_primary_10579508 |
PublicationCentury | 2000 |
PublicationDate | 2024-May-23 |
PublicationDateYYYYMMDD | 2024-05-23 |
PublicationDate_xml | – month: 05 year: 2024 text: 2024-May-23 day: 23 |
PublicationDecade | 2020 |
PublicationTitle | Proceedings (IEEE Security and Privacy Workshops. Online) |
PublicationTitleAbbrev | SPW |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003204150 |
Score | 1.8734022 |
Snippet | Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 57 |
SubjectTerms | Adversarial Perturbation Markov decision processes Perturbation methods Prediction algorithms Privacy Reinforcement learning Robustness Safe Reinforcement Learning Safety Verified Safety |
Title | Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks |
URI | https://ieeexplore.ieee.org/document/10579508 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60J0_1UfFNDl63zW6y2e1RiqUIltJa7K1k0kSKsJW6e9BfbybdVhEEb2Fh2WUmYR6Z7_sAbrlapErGeaSkTiOfUmOEsaGVyBObxtoELMzjUA2m8mGWzmqwesDCWGvD8Jlt0zLc5S9WpqJWWYc0aUm1dB_2feW2AWvtGioiIbQ5r5mFYt7tTEbPSihBVWBCHNmcZIJ-aKiEENJvwnD78c3kyGu7KrFtPn_xMv777w6h9Y3WY6NdHDqCPVscQ3Mr18Dq03sC4x7NUAdcE5toZ8sPtizY2AbuVBPahKymW31hhC1bsyDX_K5pk7KRf7taY_AkuytLAue3YNq_f-oNolpSIVp685SRRmkM5soscmE4Yq59heqTQKN1yP6MpDax491MC5fEWtN5zyR3iRTonBOn0ChWhT0DpjTyzGAWuK6zLsdYKGmJ3l057TieQ4tsNH_bsGbMt-a5-OP5JRyQn-hmPhFX0CjXlb32Ab_Em-DoLxOSqmg |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60HvRUHxXf5uB12-wmm90epViqtqX0gb2VJE2kCFupuwf99WbSrYogeAuBsMtMwkwm830fwA0V81jwMA0El3HgUmoVqFDjiKWRiUOpPRam1xedCX-YxtMSrO6xMMYY33xm6jj0b_nzpS6wVNZATVpULd2GHRf443AN1_oqqbAI8ea05BYKabMxGjwJJhjeAyNkyaYoFPRDRcUHkXYV-pvPr3tHXupFrur64xcz47__bx9q33g9MviKRAewZbJDqG4EG0h5fo9g2MIuao9sIiNpTf5OFhkZGs-eqn2hkJSEq88E0WUr4gWb3yRuUzJwq4uV8r4kt3mO8PwaTNp341YnKEUVgoUzTx5IxbVWqdDzlGmqVCrdHdWlgVpKn_9pjoViS5uJZDYKpcQTn3BqI86UtZYdQyVbZuYEiJCKJlolnu06aVIVMsENErwLKy1Vp1BDG81e17wZs415zv6Yv4bdzrjXnXXv-4_nsIc-w3f6iF1AJV8V5tKF_1xdead_AiaQrbE |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Security+and+Privacy+Workshops.+Online%29&rft.atitle=Certifying+Safety+in+Reinforcement+Learning+under+Adversarial+Perturbation+Attacks&rft.au=Wu%2C+Junlin&rft.au=Sibai%2C+Hussein&rft.au=Vorobeychik%2C+Yevgeniy&rft.date=2024-05-23&rft.pub=IEEE&rft.eissn=2770-8411&rft.spage=57&rft.epage=67&rft_id=info:doi/10.1109%2FSPW63631.2024.00011&rft.externalDocID=10579508 |