Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adv...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Security and Privacy Workshops. Online) pp. 57 - 67
Main Authors Wu, Junlin, Sibai, Hussein, Vorobeychik, Yevgeniy
Format Conference Proceeding
LanguageEnglish
Published IEEE 23.05.2024
Subjects
Online AccessGet full text
ISSN2770-8411
DOI10.1109/SPW63631.2024.00011

Cover

Loading…
Abstract Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then leverage a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.
AbstractList Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then leverage a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.
Author Sibai, Hussein
Vorobeychik, Yevgeniy
Wu, Junlin
Author_xml – sequence: 1
  givenname: Junlin
  surname: Wu
  fullname: Wu, Junlin
  email: junlin.wu@wustl.edu
  organization: Washington University in St. Louis,Computer Science & Engineering
– sequence: 2
  givenname: Hussein
  surname: Sibai
  fullname: Sibai, Hussein
  email: sibai@wustl.edu
  organization: Washington University in St. Louis,Computer Science & Engineering
– sequence: 3
  givenname: Yevgeniy
  surname: Vorobeychik
  fullname: Vorobeychik, Yevgeniy
  email: yvorobeychik@wustl.edu
  organization: Washington University in St. Louis,Computer Science & Engineering
BookMark eNotj8FOAjEURavRRES-QBf9gcHXvnbaWRKiaEIiAY1L8qa0pgrFdIoJf-8QXd3NuefmXrOLtE-esVsBYyGguV8t3musUYwlSDUGACHO2KgxjUUNqJU16pwNpDFQWSXEFRt13WePoQQlNAzYcupzieEY0wdfUfDlyGPiSx9T2Gfndz4VPveU0wk4pI3PfLL58bmjHGnLF337kFsqcZ_4pBRyX90Nuwy07fzoP4fs7fHhdfpUzV9mz9PJvIr9eKmoVc61tnYbiw7a1hJaFNg4otOP2inQEgM0hjBIQYRSaqMgSIVtCAGH7O7PG7336-8cd5SPawHaNBos_gIBYlOO
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SPW63631.2024.00011
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350354874
EISSN 2770-8411
EndPage 67
ExternalDocumentID 10579508
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i204t-ab4ccb86cd83c0bb8a383139caa00016c40523f097a3f21aa3225740f243bfff3
IEDL.DBID RIE
IngestDate Wed Jun 04 06:02:01 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-ab4ccb86cd83c0bb8a383139caa00016c40523f097a3f21aa3225740f243bfff3
PageCount 11
ParticipantIDs ieee_primary_10579508
PublicationCentury 2000
PublicationDate 2024-May-23
PublicationDateYYYYMMDD 2024-05-23
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-May-23
  day: 23
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Security and Privacy Workshops. Online)
PublicationTitleAbbrev SPW
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003204150
Score 1.8734022
Snippet Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as...
SourceID ieee
SourceType Publisher
StartPage 57
SubjectTerms Adversarial Perturbation
Markov decision processes
Perturbation methods
Prediction algorithms
Privacy
Reinforcement learning
Robustness
Safe Reinforcement Learning
Safety
Verified Safety
Title Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks
URI https://ieeexplore.ieee.org/document/10579508
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60J0_1UfFNDl63zW6y2e1RiqUIltJa7K1k0kSKsJW6e9BfbybdVhEEb2Fh2WUmYR6Z7_sAbrlapErGeaSkTiOfUmOEsaGVyBObxtoELMzjUA2m8mGWzmqwesDCWGvD8Jlt0zLc5S9WpqJWWYc0aUm1dB_2feW2AWvtGioiIbQ5r5mFYt7tTEbPSihBVWBCHNmcZIJ-aKiEENJvwnD78c3kyGu7KrFtPn_xMv777w6h9Y3WY6NdHDqCPVscQ3Mr18Dq03sC4x7NUAdcE5toZ8sPtizY2AbuVBPahKymW31hhC1bsyDX_K5pk7KRf7taY_AkuytLAue3YNq_f-oNolpSIVp685SRRmkM5soscmE4Yq59heqTQKN1yP6MpDax491MC5fEWtN5zyR3iRTonBOn0ChWhT0DpjTyzGAWuK6zLsdYKGmJ3l057TieQ4tsNH_bsGbMt-a5-OP5JRyQn-hmPhFX0CjXlb32Ab_Em-DoLxOSqmg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60HvRUHxXf5uB12-wmm90epViqtqX0gb2VJE2kCFupuwf99WbSrYogeAuBsMtMwkwm830fwA0V81jwMA0El3HgUmoVqFDjiKWRiUOpPRam1xedCX-YxtMSrO6xMMYY33xm6jj0b_nzpS6wVNZATVpULd2GHRf443AN1_oqqbAI8ea05BYKabMxGjwJJhjeAyNkyaYoFPRDRcUHkXYV-pvPr3tHXupFrur64xcz47__bx9q33g9MviKRAewZbJDqG4EG0h5fo9g2MIuao9sIiNpTf5OFhkZGs-eqn2hkJSEq88E0WUr4gWb3yRuUzJwq4uV8r4kt3mO8PwaTNp341YnKEUVgoUzTx5IxbVWqdDzlGmqVCrdHdWlgVpKn_9pjoViS5uJZDYKpcQTn3BqI86UtZYdQyVbZuYEiJCKJlolnu06aVIVMsENErwLKy1Vp1BDG81e17wZs415zv6Yv4bdzrjXnXXv-4_nsIc-w3f6iF1AJV8V5tKF_1xdead_AiaQrbE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Security+and+Privacy+Workshops.+Online%29&rft.atitle=Certifying+Safety+in+Reinforcement+Learning+under+Adversarial+Perturbation+Attacks&rft.au=Wu%2C+Junlin&rft.au=Sibai%2C+Hussein&rft.au=Vorobeychik%2C+Yevgeniy&rft.date=2024-05-23&rft.pub=IEEE&rft.eissn=2770-8411&rft.spage=57&rft.epage=67&rft_id=info:doi/10.1109%2FSPW63631.2024.00011&rft.externalDocID=10579508