ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration

Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversa...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; pp. 39081 - 39094
Main Authors	Chen, Musheng, He, Guowei, Wu, Junhua
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adversarial defense Adversarial machine learning Computational modeling Data models Detection algorithms large language model Large language models model security Natural language processing Performance standards Perturbation methods prompt engineering Restoration Robustness Semantics Sentences Training
Online Access	Get full text

Cover

Loading…

Abstract	Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques.
AbstractList	Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques.
Author	He, Guowei Chen, Musheng Wu, Junhua
Author_xml	– sequence: 1 givenname: Musheng orcidid: 0000-0001-6960-5567 surname: Chen fullname: Chen, Musheng organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China – sequence: 2 givenname: Guowei orcidid: 0009-0000-4320-0262 surname: He fullname: He, Guowei organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China – sequence: 3 givenname: Junhua orcidid: 0009-0003-0756-794X surname: Wu fullname: Wu, Junhua email: 271045802@qq.com organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China
BookMark	eNp9kU9LAzEQxYMoqNVPoIcFz1uTnSRNvJXWf1AQrF68hGwyq1vWTc1uBb-9qasgHswlmeH9hnl5h2S3DS0ScsLomDGqz6ez2eVyOS5owccAQgqpdshBwaTOQYDc_fXeJ8ddt6LpqNQSkwOyeJrP7y-yafaEMeTLl9Bnc6yw9RizKsRs6t8xdjbWtsmW9nXdYJcEPbq-Dm1mW5_dY9eHaLf1EdmrbNPh8fc9Io9Xlw-zm3xxd307my5yx6nuc49QMaWELjSU0istwXNZiWRGoC5LTQEdlDDRFjUtQFrgqfKMK-WSCkbkdpjrg12ZdaxfbfwwwdbmqxHis7Gxr12DZsJK53TBJ6As17ZSvmBOUXC21JKmPxmRs2HWOoa3TfJiVmET27S-KbTUVAgAnlR6ULkYui5iZVzdf3nuo60bw6jZZmGGLMw2C_OdRWLhD_uz8f_U6UDViPiL4IwqruATO2-UdQ
CODEN	IAECCG
CitedBy_id	crossref_primary_10_1007_s13735_024_00334_8 crossref_primary_10_1109_ACCESS_2024_3464242
Cites_doi	10.1109/msp.2017.2765202 10.1016/j.compedu.2022.104649 10.1155/2021/4907754 10.18653/v1/P19-1561 10.1609/aaai.v31i1.10970 10.18653/v1/P19-1103 10.18653/v1/D19-1423 10.18653/v1/D19-1554 10.1016/j.eswa.2023.119658 10.18653/v1/2023.findings-acl.857 10.1145/3397271.3401209 10.3390/su14137598 10.1007/s10115-022-01744-y 10.1007/s10639-022-11194-2 10.1080/09588221.2021.1939388 10.1145/3490238 10.1007/978-3-030-55393-7_28 10.18653/v1/2020.acl-main.590 10.1145/3544548.3581388 10.3390/info13020083 10.48550/ARXIV.1907.11692 10.1162/tacl_a_00318 10.3390/electronics11162483 10.1093/nsr/nwx106 10.1145/3460120.3484538 10.1109/SPW.2018.00016 10.7759/cureus.46222 10.1155/2022/4914665 10.18653/v1/2021.findings-acl.141 10.1609/aaai.v34i05.6311 10.1016/j.asoc.2021.108383 10.1016/j.lindif.2023.102274 10.1007/s00259-023-06172-w 10.18653/v1/2023.findings-acl.611 10.1016/j.neucom.2022.04.020
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA
DOI	10.1109/ACCESS.2024.3356568
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Directory of Open Access Journals (DOAJ)
DatabaseTitle	CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional
DatabaseTitleList	Materials Research Database
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2169-3536
EndPage	39094
ExternalDocumentID	oai_doaj_org_article_71bcc924738a49af8d21c803cab96035 10_1109_ACCESS_2024_3356568 10410848
Genre	orig-research
GrantInformation_xml	– fundername: Doctoral Startup Fund of Jiangxi University of Science and Technology grantid: 205200100402 funderid: 10.13039/501100008254 – fundername: Scientific Research Project of the Jiangxi Provincial Department of Education grantid: GJJ200839
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c409t-de3f18859293b6d8963d46f51095e9bb903ec3b379ae90236a34b37d1488c5103
IEDL.DBID	RIE
ISSN	2169-3536
IngestDate	Wed Aug 27 01:31:36 EDT 2025 Sun Jun 29 12:55:41 EDT 2025 Thu Apr 24 22:50:42 EDT 2025 Tue Jul 01 04:14:18 EDT 2025 Wed Aug 27 02:17:02 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c409t-de3f18859293b6d8963d46f51095e9bb903ec3b379ae90236a34b37d1488c5103
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-6960-5567 0009-0000-4320-0262 0009-0003-0756-794X
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10410848
PQID	2969055334
PQPubID	4845423
PageCount	14
ParticipantIDs	crossref_citationtrail_10_1109_ACCESS_2024_3356568 ieee_primary_10410848 doaj_primary_oai_doaj_org_article_71bcc924738a49af8d21c803cab96035 proquest_journals_2969055334 crossref_primary_10_1109_ACCESS_2024_3356568
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20240000 2024-00-00 20240101 2024-01-01
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– year: 2024 text: 20240000
PublicationDecade	2020
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE access
PublicationTitleAbbrev	Access
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref15 ref14 Li (ref17) 2018 ref11 ref10 Szegedy (ref39) 2013 Rejimoan (ref2) ref16 Zheng (ref47) 2023 ref19 Devlin (ref31) 2018 ref18 Zhu (ref42) 2021 Zhang (ref51); 28 Goyal (ref21) 2022 ref45 ref48 Ebrahimi (ref30) 2017 ref41 ref44 ref8 Wu (ref43) 2023 ref7 Maas (ref49) ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref34 ref37 ref36 ref33 ref32 ref1 ref38 Goodfellow (ref22) 2014 ref24 ref26 ref20 ref28 ref27 Gowal (ref23) 2018 ref29 Koubaa (ref46) 2023 Yoo (ref25) 2022 Socher (ref50)
References_xml	– ident: ref34 doi: 10.1109/msp.2017.2765202 – ident: ref4 doi: 10.1016/j.compedu.2022.104649 – ident: ref14 doi: 10.1155/2021/4907754 – start-page: 142 volume-title: Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Human Lang. Technol. ident: ref49 article-title: Learning word vectors for sentiment analysis – ident: ref41 doi: 10.18653/v1/P19-1561 – start-page: 1 volume-title: Proc. 2nd IEEE Delhi Sect. Flagship Conf. (DELCON) ident: ref2 article-title: A comprehensive review on deep learning approaches for question answering and machine reading comprehension in NLP – ident: ref40 doi: 10.1609/aaai.v31i1.10970 – ident: ref32 doi: 10.18653/v1/P19-1103 – year: 2022 ident: ref25 article-title: Detection of word adversarial examples in text classification: Benchmark and baseline via robust density estimation publication-title: arXiv:2203.01677 – ident: ref37 doi: 10.18653/v1/D19-1423 – ident: ref35 doi: 10.18653/v1/D19-1554 – ident: ref9 doi: 10.1016/j.eswa.2023.119658 – ident: ref24 doi: 10.18653/v1/2023.findings-acl.857 – ident: ref36 doi: 10.1145/3397271.3401209 – year: 2022 ident: ref21 article-title: A survey of adversarial defences and robustness in NLP publication-title: arXiv:2203.06414 – start-page: 1631 volume-title: Proc. Conf. Empirical Methods Natural Lang. Process. ident: ref50 article-title: Recursive deep models for semantic compositionality over a sentiment treebank – ident: ref3 doi: 10.3390/su14137598 – ident: ref7 doi: 10.1007/s10115-022-01744-y – ident: ref5 doi: 10.1007/s10639-022-11194-2 – ident: ref1 doi: 10.1080/09588221.2021.1939388 – year: 2018 ident: ref17 article-title: TextBugger: Generating adversarial text against real-world applications publication-title: arXiv:1812.05271 – ident: ref6 doi: 10.1145/3490238 – ident: ref19 doi: 10.1007/978-3-030-55393-7_28 – year: 2023 ident: ref43 article-title: A comparative study of open-source large language models, GPT-4 and claude 2: Multiple-choice test taking in nephrology publication-title: arXiv:2308.04709 – year: 2023 ident: ref46 article-title: GPT-4 vs. GPT-3.5: A concise showdown – ident: ref12 doi: 10.18653/v1/2020.acl-main.590 – ident: ref28 doi: 10.1145/3544548.3581388 – volume: 28 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref51 article-title: Character-level convolutional networks for text classification – year: 2014 ident: ref22 article-title: Explaining and harnessing adversarial examples publication-title: arXiv:1412.6572 – ident: ref8 doi: 10.3390/info13020083 – ident: ref45 doi: 10.48550/ARXIV.1907.11692 – ident: ref33 doi: 10.1162/tacl_a_00318 – year: 2017 ident: ref30 article-title: HotFlip: White-box adversarial examples for text classification publication-title: arXiv:1712.06751 – ident: ref11 doi: 10.3390/electronics11162483 – ident: ref20 doi: 10.1093/nsr/nwx106 – ident: ref38 doi: 10.1145/3460120.3484538 – year: 2021 ident: ref42 article-title: TREATED: Towards universal defense against textual adversarial attacks publication-title: arXiv:2109.06176 – year: 2023 ident: ref47 article-title: Judging LLM-as-a-judge with MT-bench and chatbot arena publication-title: arXiv:2306.05685 – ident: ref16 doi: 10.1109/SPW.2018.00016 – ident: ref48 doi: 10.7759/cureus.46222 – ident: ref10 doi: 10.1155/2022/4914665 – ident: ref26 doi: 10.18653/v1/2021.findings-acl.141 – ident: ref18 doi: 10.1609/aaai.v34i05.6311 – year: 2018 ident: ref31 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding publication-title: arXiv:1810.04805 – ident: ref15 doi: 10.1016/j.asoc.2021.108383 – year: 2018 ident: ref23 article-title: On the effectiveness of interval bound propagation for training verifiably robust models publication-title: arXiv:1810.12715 – ident: ref27 doi: 10.1016/j.lindif.2023.102274 – ident: ref44 doi: 10.1007/s00259-023-06172-w – ident: ref29 doi: 10.18653/v1/2023.findings-acl.611 – year: 2013 ident: ref39 article-title: Intriguing properties of neural networks publication-title: arXiv:1312.6199 – ident: ref13 doi: 10.1016/j.neucom.2022.04.020
SSID	ssj0000816957
Score	2.3381865
Snippet	Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on...
SourceID	doaj proquest crossref ieee
SourceType	Open Website Aggregation Database Enrichment Source Index Database Publisher
StartPage	39081
SubjectTerms	Adversarial defense Adversarial machine learning Computational modeling Data models Detection algorithms large language model Large language models model security Natural language processing Performance standards Perturbation methods prompt engineering Restoration Robustness Semantics Sentences Training
SummonAdditionalLinks	– databaseName: Directory of Open Access Journals (DOAJ) dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELYQEwyIRxGFgjwwEhrHiWuzlZaqQsDQUqnqYsWPiAGlqA3_n7PjVkFIsLAl0SWO786-O8v-PoSureaxYQmPLBMmSqkxUa5hMoRgoEimaFjQf35h41n6OM_mDaovtyeshgeuFdftEaU1FAk9yvNU5AU3CYHvU50rSL6pRy-FmNcopvwczAkTWS_ADJFYdPuDAfQICsIkvaXUpTH8WyjyiP2BYuXHvOyDzegQHYQsEffrvztCO7Y8RvsN7MAT9LQYDid3uI8XdrWMpm_LCg9t4YnhMCSi2DMtr3PnX3iaOwzgNQhUfudVifPS4IknlfGWaaHZ6OF1MI4CNUKkoSCrImNpQTjPILmhihkOw8ikrIABJjIrlBIxtZoq2nPY2w4kPqcp3Bkofrh2IHqnaLdclvYMYbAMRG1TKMZ0at05VpDhmii4jJUlbZRstCR1wA139BXv0tcPsZC1aqVTrQyqbaOb7UsfNWzG7-L3Tv1bUYd57R-AJ8jgCfIvT2ijljNeo72UOLaANupsrCnDAF3LRDARZ-4c8vl_tH2B9lx_6rWZDtqtVp_2ErKVSl15x_wCACbf7Q priority: 102 providerName: Directory of Open Access Journals
Title	ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration
URI	https://ieeexplore.ieee.org/document/10410848 https://www.proquest.com/docview/2969055334 https://doaj.org/article/71bcc924738a49af8d21c803cab96035
Volume	12
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9swDCbWnNbD-hyWLi106HFObct2pN6ypEFQdDn0ARS9CNbDGLDBKRrn0l8_UlaCrEWL3mRDgmV_pETS4keAU2dEbItURK6QNsq4tVFpcDHEzUAnueYhoP9rVkzvssv7_D4kq_tcGOecP3zm-tT0__Lt3CwpVIYaniXE_74FW-i5tcla64AKVZCQ-SAwCyWxPBuORvgS6AOmWZ9zslzEf7uPJ-kPVVVeLcV-f5nswGw1s_ZYyZ_-stF98_yCtPHDU9-FL8HSZMNWNPbgk6v3YXuDf_AArh7G4-tzNmQP7mke3fyeN2zsKl9cjqExy3y15kVJMspuSuIRXmCHxp_eqllZW3btC9N4dA_hbnJxO5pGobxCZNCpayLreJUIkaOBxHVhBaqizYoKlVTmTmotY-4M13xA_N1ENF_yDK8sOlDCEBHfV-jU89p9A4bo4s5vK10UJnOUC4t9hEk0NmPtki6kq8-uTOAepxIYf5X3QWKpWqwUYaUCVl34sR702FJvvN_9J-G57kq82f4G4qCCGqpBoo1BlxOnW2ayrIRNE5RWbkqNrhzPu3BI2G08r4WtC72VeKig5AuVykLGOeUyH70x7Dt8pim2IZsedJqnpTtGI6bRJ975P_Ei_A-syOsu
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwEB6VcgAO5VXULS34ADeyJHGStStx2O5Sbem2hz6kqhcTP6JKoCzqZlWV_9K_0t_WGce7KiC4VeKWRLbj2J_HM87MNwDvnBGxLVIRuULaKOPWRqVBYYibgU5yzcOB_v5BMTrJvpzmp0twvYiFcc555zPXpUv_L99OzIyOynCFZwnxvwcfyj13dYkW2vTT7hCn832a7nw-HoyikEQgMmi6NJF1vEqEyFEN4LqwAgFns6JCKMrcSa1lzJ3hmveIpZro1Eue4Z1FM0EYopvDdh_AQ1Q08rQND1sc4VDOCpn3ApcRNvixPxjgsKHVmWZdzklXEr_sdz4tQMjj8ofw9zvazlO4mY9F68jyrTtrdNf8_I0m8r8drGewEnRp1m_B_xyWXP0CntxhWHwJ47Ph8HCL9dmZu5hER-eThg1d5dPnMVTXmc9HPS1pFbKjkpiSp1ig8f5pNStryw596h2P31U4uZfPeQXL9aR2a8AQv6jb2EoXhckcRftiGWESjZexdkkH0vk0KxPY1SnJx3flraxYqhYbirChAjY68GFR6UdLLvLv4tuEn0VRYgb3D3DeVRA0qpdoY9Coxu6WmSwrYdME1yM3pUZjlecdWCWs3HlfC5MObMzhqIIYm6pUFjLOKVp7_S_V3sKj0fH-WI13D_Zew2PqbntAtQHLzcXMbaLK1ug3fuEw-Hrf4LsF_StE6A
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ZDDR%3A+A+Zero-Shot+Defender+for+Adversarial+Samples+Detection+and+Restoration&rft.jtitle=IEEE+access&rft.au=Chen%2C+Musheng&rft.au=He%2C+Guowei&rft.au=Wu%2C+Junhua&rft.date=2024&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=12&rft.spage=39081&rft.epage=39094&rft_id=info:doi/10.1109%2FACCESS.2024.3356568&rft.externalDocID=10410848
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon