ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration

Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversa...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 12; pp. 39081 - 39094
Main Authors Chen, Musheng, He, Guowei, Wu, Junhua
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques.
AbstractList Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques.
Author He, Guowei
Chen, Musheng
Wu, Junhua
Author_xml – sequence: 1
  givenname: Musheng
  orcidid: 0000-0001-6960-5567
  surname: Chen
  fullname: Chen, Musheng
  organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
– sequence: 2
  givenname: Guowei
  orcidid: 0009-0000-4320-0262
  surname: He
  fullname: He, Guowei
  organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
– sequence: 3
  givenname: Junhua
  orcidid: 0009-0003-0756-794X
  surname: Wu
  fullname: Wu, Junhua
  email: 271045802@qq.com
  organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
BookMark eNp9kU9LAzEQxYMoqNVPoIcFz1uTnSRNvJXWf1AQrF68hGwyq1vWTc1uBb-9qasgHswlmeH9hnl5h2S3DS0ScsLomDGqz6ez2eVyOS5owccAQgqpdshBwaTOQYDc_fXeJ8ddt6LpqNQSkwOyeJrP7y-yafaEMeTLl9Bnc6yw9RizKsRs6t8xdjbWtsmW9nXdYJcEPbq-Dm1mW5_dY9eHaLf1EdmrbNPh8fc9Io9Xlw-zm3xxd307my5yx6nuc49QMaWELjSU0istwXNZiWRGoC5LTQEdlDDRFjUtQFrgqfKMK-WSCkbkdpjrg12ZdaxfbfwwwdbmqxHis7Gxr12DZsJK53TBJ6As17ZSvmBOUXC21JKmPxmRs2HWOoa3TfJiVmET27S-KbTUVAgAnlR6ULkYui5iZVzdf3nuo60bw6jZZmGGLMw2C_OdRWLhD_uz8f_U6UDViPiL4IwqruATO2-UdQ
CODEN IAECCG
CitedBy_id crossref_primary_10_1007_s13735_024_00334_8
crossref_primary_10_1109_ACCESS_2024_3464242
Cites_doi 10.1109/msp.2017.2765202
10.1016/j.compedu.2022.104649
10.1155/2021/4907754
10.18653/v1/P19-1561
10.1609/aaai.v31i1.10970
10.18653/v1/P19-1103
10.18653/v1/D19-1423
10.18653/v1/D19-1554
10.1016/j.eswa.2023.119658
10.18653/v1/2023.findings-acl.857
10.1145/3397271.3401209
10.3390/su14137598
10.1007/s10115-022-01744-y
10.1007/s10639-022-11194-2
10.1080/09588221.2021.1939388
10.1145/3490238
10.1007/978-3-030-55393-7_28
10.18653/v1/2020.acl-main.590
10.1145/3544548.3581388
10.3390/info13020083
10.48550/ARXIV.1907.11692
10.1162/tacl_a_00318
10.3390/electronics11162483
10.1093/nsr/nwx106
10.1145/3460120.3484538
10.1109/SPW.2018.00016
10.7759/cureus.46222
10.1155/2022/4914665
10.18653/v1/2021.findings-acl.141
10.1609/aaai.v34i05.6311
10.1016/j.asoc.2021.108383
10.1016/j.lindif.2023.102274
10.1007/s00259-023-06172-w
10.18653/v1/2023.findings-acl.611
10.1016/j.neucom.2022.04.020
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2024.3356568
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Directory of Open Access Journals (DOAJ)
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Materials Research Database

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 39094
ExternalDocumentID oai_doaj_org_article_71bcc924738a49af8d21c803cab96035
10_1109_ACCESS_2024_3356568
10410848
Genre orig-research
GrantInformation_xml – fundername: Doctoral Startup Fund of Jiangxi University of Science and Technology
  grantid: 205200100402
  funderid: 10.13039/501100008254
– fundername: Scientific Research Project of the Jiangxi Provincial Department of Education
  grantid: GJJ200839
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
RIG
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c409t-de3f18859293b6d8963d46f51095e9bb903ec3b379ae90236a34b37d1488c5103
IEDL.DBID RIE
ISSN 2169-3536
IngestDate Wed Aug 27 01:31:36 EDT 2025
Sun Jun 29 12:55:41 EDT 2025
Thu Apr 24 22:50:42 EDT 2025
Tue Jul 01 04:14:18 EDT 2025
Wed Aug 27 02:17:02 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c409t-de3f18859293b6d8963d46f51095e9bb903ec3b379ae90236a34b37d1488c5103
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6960-5567
0009-0000-4320-0262
0009-0003-0756-794X
OpenAccessLink https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10410848
PQID 2969055334
PQPubID 4845423
PageCount 14
ParticipantIDs crossref_citationtrail_10_1109_ACCESS_2024_3356568
ieee_primary_10410848
doaj_primary_oai_doaj_org_article_71bcc924738a49af8d21c803cab96035
proquest_journals_2969055334
crossref_primary_10_1109_ACCESS_2024_3356568
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20240000
2024-00-00
20240101
2024-01-01
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 20240000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
Li (ref17) 2018
ref11
ref10
Szegedy (ref39) 2013
Rejimoan (ref2)
ref16
Zheng (ref47) 2023
ref19
Devlin (ref31) 2018
ref18
Zhu (ref42) 2021
Zhang (ref51); 28
Goyal (ref21) 2022
ref45
ref48
Ebrahimi (ref30) 2017
ref41
ref44
ref8
Wu (ref43) 2023
ref7
Maas (ref49)
ref9
ref4
ref3
ref6
ref5
ref40
ref35
ref34
ref37
ref36
ref33
ref32
ref1
ref38
Goodfellow (ref22) 2014
ref24
ref26
ref20
ref28
ref27
Gowal (ref23) 2018
ref29
Koubaa (ref46) 2023
Yoo (ref25) 2022
Socher (ref50)
References_xml – ident: ref34
  doi: 10.1109/msp.2017.2765202
– ident: ref4
  doi: 10.1016/j.compedu.2022.104649
– ident: ref14
  doi: 10.1155/2021/4907754
– start-page: 142
  volume-title: Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Human Lang. Technol.
  ident: ref49
  article-title: Learning word vectors for sentiment analysis
– ident: ref41
  doi: 10.18653/v1/P19-1561
– start-page: 1
  volume-title: Proc. 2nd IEEE Delhi Sect. Flagship Conf. (DELCON)
  ident: ref2
  article-title: A comprehensive review on deep learning approaches for question answering and machine reading comprehension in NLP
– ident: ref40
  doi: 10.1609/aaai.v31i1.10970
– ident: ref32
  doi: 10.18653/v1/P19-1103
– year: 2022
  ident: ref25
  article-title: Detection of word adversarial examples in text classification: Benchmark and baseline via robust density estimation
  publication-title: arXiv:2203.01677
– ident: ref37
  doi: 10.18653/v1/D19-1423
– ident: ref35
  doi: 10.18653/v1/D19-1554
– ident: ref9
  doi: 10.1016/j.eswa.2023.119658
– ident: ref24
  doi: 10.18653/v1/2023.findings-acl.857
– ident: ref36
  doi: 10.1145/3397271.3401209
– year: 2022
  ident: ref21
  article-title: A survey of adversarial defences and robustness in NLP
  publication-title: arXiv:2203.06414
– start-page: 1631
  volume-title: Proc. Conf. Empirical Methods Natural Lang. Process.
  ident: ref50
  article-title: Recursive deep models for semantic compositionality over a sentiment treebank
– ident: ref3
  doi: 10.3390/su14137598
– ident: ref7
  doi: 10.1007/s10115-022-01744-y
– ident: ref5
  doi: 10.1007/s10639-022-11194-2
– ident: ref1
  doi: 10.1080/09588221.2021.1939388
– year: 2018
  ident: ref17
  article-title: TextBugger: Generating adversarial text against real-world applications
  publication-title: arXiv:1812.05271
– ident: ref6
  doi: 10.1145/3490238
– ident: ref19
  doi: 10.1007/978-3-030-55393-7_28
– year: 2023
  ident: ref43
  article-title: A comparative study of open-source large language models, GPT-4 and claude 2: Multiple-choice test taking in nephrology
  publication-title: arXiv:2308.04709
– year: 2023
  ident: ref46
  article-title: GPT-4 vs. GPT-3.5: A concise showdown
– ident: ref12
  doi: 10.18653/v1/2020.acl-main.590
– ident: ref28
  doi: 10.1145/3544548.3581388
– volume: 28
  start-page: 1
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref51
  article-title: Character-level convolutional networks for text classification
– year: 2014
  ident: ref22
  article-title: Explaining and harnessing adversarial examples
  publication-title: arXiv:1412.6572
– ident: ref8
  doi: 10.3390/info13020083
– ident: ref45
  doi: 10.48550/ARXIV.1907.11692
– ident: ref33
  doi: 10.1162/tacl_a_00318
– year: 2017
  ident: ref30
  article-title: HotFlip: White-box adversarial examples for text classification
  publication-title: arXiv:1712.06751
– ident: ref11
  doi: 10.3390/electronics11162483
– ident: ref20
  doi: 10.1093/nsr/nwx106
– ident: ref38
  doi: 10.1145/3460120.3484538
– year: 2021
  ident: ref42
  article-title: TREATED: Towards universal defense against textual adversarial attacks
  publication-title: arXiv:2109.06176
– year: 2023
  ident: ref47
  article-title: Judging LLM-as-a-judge with MT-bench and chatbot arena
  publication-title: arXiv:2306.05685
– ident: ref16
  doi: 10.1109/SPW.2018.00016
– ident: ref48
  doi: 10.7759/cureus.46222
– ident: ref10
  doi: 10.1155/2022/4914665
– ident: ref26
  doi: 10.18653/v1/2021.findings-acl.141
– ident: ref18
  doi: 10.1609/aaai.v34i05.6311
– year: 2018
  ident: ref31
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
  publication-title: arXiv:1810.04805
– ident: ref15
  doi: 10.1016/j.asoc.2021.108383
– year: 2018
  ident: ref23
  article-title: On the effectiveness of interval bound propagation for training verifiably robust models
  publication-title: arXiv:1810.12715
– ident: ref27
  doi: 10.1016/j.lindif.2023.102274
– ident: ref44
  doi: 10.1007/s00259-023-06172-w
– ident: ref29
  doi: 10.18653/v1/2023.findings-acl.611
– year: 2013
  ident: ref39
  article-title: Intriguing properties of neural networks
  publication-title: arXiv:1312.6199
– ident: ref13
  doi: 10.1016/j.neucom.2022.04.020
SSID ssj0000816957
Score 2.3381865
Snippet Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 39081
SubjectTerms Adversarial defense
Adversarial machine learning
Computational modeling
Data models
Detection algorithms
large language model
Large language models
model security
Natural language processing
Performance standards
Perturbation methods
prompt engineering
Restoration
Robustness
Semantics
Sentences
Training
SummonAdditionalLinks – databaseName: Directory of Open Access Journals (DOAJ)
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELYQEwyIRxGFgjwwEhrHiWuzlZaqQsDQUqnqYsWPiAGlqA3_n7PjVkFIsLAl0SWO786-O8v-PoSureaxYQmPLBMmSqkxUa5hMoRgoEimaFjQf35h41n6OM_mDaovtyeshgeuFdftEaU1FAk9yvNU5AU3CYHvU50rSL6pRy-FmNcopvwczAkTWS_ADJFYdPuDAfQICsIkvaXUpTH8WyjyiP2BYuXHvOyDzegQHYQsEffrvztCO7Y8RvsN7MAT9LQYDid3uI8XdrWMpm_LCg9t4YnhMCSi2DMtr3PnX3iaOwzgNQhUfudVifPS4IknlfGWaaHZ6OF1MI4CNUKkoSCrImNpQTjPILmhihkOw8ikrIABJjIrlBIxtZoq2nPY2w4kPqcp3Bkofrh2IHqnaLdclvYMYbAMRG1TKMZ0at05VpDhmii4jJUlbZRstCR1wA139BXv0tcPsZC1aqVTrQyqbaOb7UsfNWzG7-L3Tv1bUYd57R-AJ8jgCfIvT2ijljNeo72UOLaANupsrCnDAF3LRDARZ-4c8vl_tH2B9lx_6rWZDtqtVp_2ErKVSl15x_wCACbf7Q
  priority: 102
  providerName: Directory of Open Access Journals
Title ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration
URI https://ieeexplore.ieee.org/document/10410848
https://www.proquest.com/docview/2969055334
https://doaj.org/article/71bcc924738a49af8d21c803cab96035
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9swDCbWnNbD-hyWLi106HFObct2pN6ypEFQdDn0ARS9CNbDGLDBKRrn0l8_UlaCrEWL3mRDgmV_pETS4keAU2dEbItURK6QNsq4tVFpcDHEzUAnueYhoP9rVkzvssv7_D4kq_tcGOecP3zm-tT0__Lt3CwpVIYaniXE_74FW-i5tcla64AKVZCQ-SAwCyWxPBuORvgS6AOmWZ9zslzEf7uPJ-kPVVVeLcV-f5nswGw1s_ZYyZ_-stF98_yCtPHDU9-FL8HSZMNWNPbgk6v3YXuDf_AArh7G4-tzNmQP7mke3fyeN2zsKl9cjqExy3y15kVJMspuSuIRXmCHxp_eqllZW3btC9N4dA_hbnJxO5pGobxCZNCpayLreJUIkaOBxHVhBaqizYoKlVTmTmotY-4M13xA_N1ENF_yDK8sOlDCEBHfV-jU89p9A4bo4s5vK10UJnOUC4t9hEk0NmPtki6kq8-uTOAepxIYf5X3QWKpWqwUYaUCVl34sR702FJvvN_9J-G57kq82f4G4qCCGqpBoo1BlxOnW2ayrIRNE5RWbkqNrhzPu3BI2G08r4WtC72VeKig5AuVykLGOeUyH70x7Dt8pim2IZsedJqnpTtGI6bRJ975P_Ei_A-syOsu
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwEB6VcgAO5VXULS34ADeyJHGStStx2O5Sbem2hz6kqhcTP6JKoCzqZlWV_9K_0t_WGce7KiC4VeKWRLbj2J_HM87MNwDvnBGxLVIRuULaKOPWRqVBYYibgU5yzcOB_v5BMTrJvpzmp0twvYiFcc555zPXpUv_L99OzIyOynCFZwnxvwcfyj13dYkW2vTT7hCn832a7nw-HoyikEQgMmi6NJF1vEqEyFEN4LqwAgFns6JCKMrcSa1lzJ3hmveIpZro1Eue4Z1FM0EYopvDdh_AQ1Q08rQND1sc4VDOCpn3ApcRNvixPxjgsKHVmWZdzklXEr_sdz4tQMjj8ofw9zvazlO4mY9F68jyrTtrdNf8_I0m8r8drGewEnRp1m_B_xyWXP0CntxhWHwJ47Ph8HCL9dmZu5hER-eThg1d5dPnMVTXmc9HPS1pFbKjkpiSp1ig8f5pNStryw596h2P31U4uZfPeQXL9aR2a8AQv6jb2EoXhckcRftiGWESjZexdkkH0vk0KxPY1SnJx3flraxYqhYbirChAjY68GFR6UdLLvLv4tuEn0VRYgb3D3DeVRA0qpdoY9Coxu6WmSwrYdME1yM3pUZjlecdWCWs3HlfC5MObMzhqIIYm6pUFjLOKVp7_S_V3sKj0fH-WI13D_Zew2PqbntAtQHLzcXMbaLK1ug3fuEw-Hrf4LsF_StE6A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ZDDR%3A+A+Zero-Shot+Defender+for+Adversarial+Samples+Detection+and+Restoration&rft.jtitle=IEEE+access&rft.au=Chen%2C+Musheng&rft.au=He%2C+Guowei&rft.au=Wu%2C+Junhua&rft.date=2024&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=12&rft.spage=39081&rft.epage=39094&rft_id=info:doi/10.1109%2FACCESS.2024.3356568&rft.externalDocID=10410848
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon