On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability....

Full description

Saved in:
Bibliographic Details
Main Authors Gao, Songyang, Dou, Shihan, Zhang, Qi, Huang, Xuanjing, Ma, Jin, Shan, Ying
Format Journal Article
LanguageEnglish
Published 26.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.
AbstractList Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time consumption to normal inference.
Author Shan, Ying
Dou, Shihan
Zhang, Qi
Gao, Songyang
Huang, Xuanjing
Ma, Jin
Author_xml – sequence: 1
  givenname: Songyang
  surname: Gao
  fullname: Gao, Songyang
– sequence: 2
  givenname: Shihan
  surname: Dou
  fullname: Dou, Shihan
– sequence: 3
  givenname: Qi
  surname: Zhang
  fullname: Zhang, Qi
– sequence: 4
  givenname: Xuanjing
  surname: Huang
  fullname: Huang, Xuanjing
– sequence: 5
  givenname: Jin
  surname: Ma
  fullname: Ma, Jin
– sequence: 6
  givenname: Ying
  surname: Shan
  fullname: Shan, Ying
BackLink https://doi.org/10.48550/arXiv.2306.15705$$DView paper in arXiv
BookMark eNpVj71OwzAUhT3AAIUHYMIvkODEdXIzVm35kSqVoUhs0bV9r7BUHOSYCt6eNrAwnW8450jfpTiLQyQhbipVzsEYdYfpKxzKWqumrEyrzIV43UaZ30i-xHCgNOJeLvwEKRz5mVL-TBZzGOIoeUhyzRxcoJjlCjMWnIj-LVaUyZ3qV-KccT_S9V_OxO5-vVs-Fpvtw9NysSmwaU1hfUeVpoq0U0wMHhrDXcseYN4BOu_JklEVOKUsaqj1sQ9Yewuusez1TNz-3k5q_UcK75i--5NiPynqH1EET94
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by-sa/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by-sa/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2306.15705
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2306_15705
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a675-bd9e13e1e3c0fef8d865f97fd88498acddebe5018c00ba38239e18a2db8c6bfd3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:49:18 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a675-bd9e13e1e3c0fef8d865f97fd88498acddebe5018c00ba38239e18a2db8c6bfd3
OpenAccessLink https://arxiv.org/abs/2306.15705
ParticipantIDs arxiv_primary_2306_15705
PublicationCentury 2000
PublicationDate 2023-06-26
PublicationDateYYYYMMDD 2023-06-26
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-06-26
  day: 26
PublicationDecade 2020
PublicationYear 2023
Score 1.8899709
SecondaryResourceType preprint
Snippet Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Computer Science - Learning
Title On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
URI https://arxiv.org/abs/2306.15705
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8QwDI2Om1gQCNDxqQysgX6kaToi7o4TEhzDIXWr7MSRWAoqBfHzSdIiYGCLYmd5luKn2Hlm7AJkkQEZHwGpCyENofB5HUQqAY2u0iyx4b3j_kGtnuRdXdQTxr__wkD3-fwx6APj21Xgx5dpUQaR0q0sCy1bt-t6KE5GKa7R_8fPc8y49StJLHfZzsju-PUQjj02oXaf1euWe5rFxx6IYLdxEWLPH6nztz4OD2fcU0i-iKoOPhnwOfQgXEf058Sc-tg_1R6wzXKxuVmJcaCBAM_LBdqK0pxSyk3iyGmrVeGq0lmtZaXB-JsGKQjsmSRBCAU6768hs6iNQmfzQzZtX1qaMW6Vo9KRQkSQOs9RUeKUx7VUFgtIjtgswtC8DpoVTUCoiQgd_286YdthmnrohMrUKZv23Tud-Zzb43kE_gvBWYRv
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+the+Universal+Adversarial+Perturbations+for+Efficient+Data-free+Adversarial+Detection&rft.au=Gao%2C+Songyang&rft.au=Dou%2C+Shihan&rft.au=Zhang%2C+Qi&rft.au=Huang%2C+Xuanjing&rft.date=2023-06-26&rft_id=info:doi/10.48550%2Farxiv.2306.15705&rft.externalDocID=2306_15705