Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adv...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Security and Privacy Workshops. Online) pp. 57 - 67
Main Authors	Wu, Junlin, Sibai, Hussein, Vorobeychik, Yevgeniy
Format	Conference Proceeding
Language	English
Published	IEEE 23.05.2024
Subjects	Adversarial Perturbation Markov decision processes Perturbation methods Prediction algorithms Privacy Reinforcement learning Robustness Safe Reinforcement Learning Safety Verified Safety
Online Access	Get full text
ISSN	2770-8411
DOI	10.1109/SPW63631.2024.00011

Cover

Loading…

Abstract	Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then leverage a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.
AbstractList	Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then leverage a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.
Author	Sibai, Hussein Vorobeychik, Yevgeniy Wu, Junlin
Author_xml	– sequence: 1 givenname: Junlin surname: Wu fullname: Wu, Junlin email: junlin.wu@wustl.edu organization: Washington University in St. Louis,Computer Science & Engineering – sequence: 2 givenname: Hussein surname: Sibai fullname: Sibai, Hussein email: sibai@wustl.edu organization: Washington University in St. Louis,Computer Science & Engineering – sequence: 3 givenname: Yevgeniy surname: Vorobeychik fullname: Vorobeychik, Yevgeniy email: yvorobeychik@wustl.edu organization: Washington University in St. Louis,Computer Science & Engineering
BookMark	eNotj8FOAjEURavRRES-QBf9gcHXvnbaWRKiaEIiAY1L8qa0pgrFdIoJf-8QXd3NuefmXrOLtE-esVsBYyGguV8t3musUYwlSDUGACHO2KgxjUUNqJU16pwNpDFQWSXEFRt13WePoQQlNAzYcupzieEY0wdfUfDlyGPiSx9T2Gfndz4VPveU0wk4pI3PfLL58bmjHGnLF337kFsqcZ_4pBRyX90Nuwy07fzoP4fs7fHhdfpUzV9mz9PJvIr9eKmoVc61tnYbiw7a1hJaFNg4otOP2inQEgM0hjBIQYRSaqMgSIVtCAGH7O7PG7336-8cd5SPawHaNBos_gIBYlOO
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/SPW63631.2024.00011
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798350354874
EISSN	2770-8411
EndPage	67
ExternalDocumentID	10579508
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i204t-ab4ccb86cd83c0bb8a383139caa00016c40523f097a3f21aa3225740f243bfff3
IEDL.DBID	RIE
IngestDate	Wed Jun 04 06:02:01 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-ab4ccb86cd83c0bb8a383139caa00016c40523f097a3f21aa3225740f243bfff3
PageCount	11
ParticipantIDs	ieee_primary_10579508
PublicationCentury	2000
PublicationDate	2024-May-23
PublicationDateYYYYMMDD	2024-05-23
PublicationDate_xml	– month: 05 year: 2024 text: 2024-May-23 day: 23
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Security and Privacy Workshops. Online)
PublicationTitleAbbrev	SPW
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003204150
Score	1.8734022
Snippet	Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as...
SourceID	ieee
SourceType	Publisher
StartPage	57
SubjectTerms	Adversarial Perturbation Markov decision processes Perturbation methods Prediction algorithms Privacy Reinforcement learning Robustness Safe Reinforcement Learning Safety Verified Safety
Title	Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks
URI	https://ieeexplore.ieee.org/document/10579508
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60J0_1UfFNDl63zW6y2e1RiqUIltJa7K1k0kSKsJW6e9BfbybdVhEEb2Fh2WUmYR6Z7_sAbrlapErGeaSkTiOfUmOEsaGVyBObxtoELMzjUA2m8mGWzmqwesDCWGvD8Jlt0zLc5S9WpqJWWYc0aUm1dB_2feW2AWvtGioiIbQ5r5mFYt7tTEbPSihBVWBCHNmcZIJ-aKiEENJvwnD78c3kyGu7KrFtPn_xMv777w6h9Y3WY6NdHDqCPVscQ3Mr18Dq03sC4x7NUAdcE5toZ8sPtizY2AbuVBPahKymW31hhC1bsyDX_K5pk7KRf7taY_AkuytLAue3YNq_f-oNolpSIVp685SRRmkM5soscmE4Yq59heqTQKN1yP6MpDax491MC5fEWtN5zyR3iRTonBOn0ChWhT0DpjTyzGAWuK6zLsdYKGmJ3l057TieQ4tsNH_bsGbMt-a5-OP5JRyQn-hmPhFX0CjXlb32Ab_Em-DoLxOSqmg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60HvRUHxXf5uB12-wmm90epViqtqX0gb2VJE2kCFupuwf99WbSrYogeAuBsMtMwkwm830fwA0V81jwMA0El3HgUmoVqFDjiKWRiUOpPRam1xedCX-YxtMSrO6xMMYY33xm6jj0b_nzpS6wVNZATVpULd2GHRf443AN1_oqqbAI8ea05BYKabMxGjwJJhjeAyNkyaYoFPRDRcUHkXYV-pvPr3tHXupFrur64xcz47__bx9q33g9MviKRAewZbJDqG4EG0h5fo9g2MIuao9sIiNpTf5OFhkZGs-eqn2hkJSEq88E0WUr4gWb3yRuUzJwq4uV8r4kt3mO8PwaTNp341YnKEUVgoUzTx5IxbVWqdDzlGmqVCrdHdWlgVpKn_9pjoViS5uJZDYKpcQTn3BqI86UtZYdQyVbZuYEiJCKJlolnu06aVIVMsENErwLKy1Vp1BDG81e17wZs415zv6Yv4bdzrjXnXXv-4_nsIc-w3f6iF1AJV8V5tKF_1xdead_AiaQrbE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Security+and+Privacy+Workshops.+Online%29&rft.atitle=Certifying+Safety+in+Reinforcement+Learning+under+Adversarial+Perturbation+Attacks&rft.au=Wu%2C+Junlin&rft.au=Sibai%2C+Hussein&rft.au=Vorobeychik%2C+Yevgeniy&rft.date=2024-05-23&rft.pub=IEEE&rft.eissn=2770-8411&rft.spage=57&rft.epage=67&rft_id=info:doi/10.1109%2FSPW63631.2024.00011&rft.externalDocID=10579508