ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration
Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversa...
Saved in:
Published in | IEEE access Vol. 12; pp. 39081 - 39094 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques. |
---|---|
AbstractList | Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on supervised detection techniques, which makes them vulnerable to issues arising from training data quality, inherent biases, noise, or adversarial inputs. This study observed common compromises in sentence fluency during aggression. On this basis, the Zero Sample Defender (ZDDR) is introduced for adversarial sample detection and recovery without relying on prior knowledge. ZDDR combines the log probability calculated by the model and the syntactic normative score of a large language model (LLM) to detect adversarial examples. Furthermore, using strategic prompts, ZDDR guides LLM in rephrasing adversarial content, maintaining clarity, structure, and meaning, thereby restoring the sentence from the attack. Benchmarking reveals a 9% improvement in area under receiver operating characteristic curve (AUROC) for adversarial detection over existing techniques. Post-restoration, model classification efficacy surges by 45% compared to the offensive inputs, setting new performance standards against other restoration techniques. |
Author | He, Guowei Chen, Musheng Wu, Junhua |
Author_xml | – sequence: 1 givenname: Musheng orcidid: 0000-0001-6960-5567 surname: Chen fullname: Chen, Musheng organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China – sequence: 2 givenname: Guowei orcidid: 0009-0000-4320-0262 surname: He fullname: He, Guowei organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China – sequence: 3 givenname: Junhua orcidid: 0009-0003-0756-794X surname: Wu fullname: Wu, Junhua email: 271045802@qq.com organization: School of Software Engineering, Jiangxi University of Science and Technology, Nanchang, China |
BookMark | eNp9kU9LAzEQxYMoqNVPoIcFz1uTnSRNvJXWf1AQrF68hGwyq1vWTc1uBb-9qasgHswlmeH9hnl5h2S3DS0ScsLomDGqz6ez2eVyOS5owccAQgqpdshBwaTOQYDc_fXeJ8ddt6LpqNQSkwOyeJrP7y-yafaEMeTLl9Bnc6yw9RizKsRs6t8xdjbWtsmW9nXdYJcEPbq-Dm1mW5_dY9eHaLf1EdmrbNPh8fc9Io9Xlw-zm3xxd307my5yx6nuc49QMaWELjSU0istwXNZiWRGoC5LTQEdlDDRFjUtQFrgqfKMK-WSCkbkdpjrg12ZdaxfbfwwwdbmqxHis7Gxr12DZsJK53TBJ6As17ZSvmBOUXC21JKmPxmRs2HWOoa3TfJiVmET27S-KbTUVAgAnlR6ULkYui5iZVzdf3nuo60bw6jZZmGGLMw2C_OdRWLhD_uz8f_U6UDViPiL4IwqruATO2-UdQ |
CODEN | IAECCG |
CitedBy_id | crossref_primary_10_1007_s13735_024_00334_8 crossref_primary_10_1109_ACCESS_2024_3464242 |
Cites_doi | 10.1109/msp.2017.2765202 10.1016/j.compedu.2022.104649 10.1155/2021/4907754 10.18653/v1/P19-1561 10.1609/aaai.v31i1.10970 10.18653/v1/P19-1103 10.18653/v1/D19-1423 10.18653/v1/D19-1554 10.1016/j.eswa.2023.119658 10.18653/v1/2023.findings-acl.857 10.1145/3397271.3401209 10.3390/su14137598 10.1007/s10115-022-01744-y 10.1007/s10639-022-11194-2 10.1080/09588221.2021.1939388 10.1145/3490238 10.1007/978-3-030-55393-7_28 10.18653/v1/2020.acl-main.590 10.1145/3544548.3581388 10.3390/info13020083 10.48550/ARXIV.1907.11692 10.1162/tacl_a_00318 10.3390/electronics11162483 10.1093/nsr/nwx106 10.1145/3460120.3484538 10.1109/SPW.2018.00016 10.7759/cureus.46222 10.1155/2022/4914665 10.18653/v1/2021.findings-acl.141 10.1609/aaai.v34i05.6311 10.1016/j.asoc.2021.108383 10.1016/j.lindif.2023.102274 10.1007/s00259-023-06172-w 10.18653/v1/2023.findings-acl.611 10.1016/j.neucom.2022.04.020 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
DOI | 10.1109/ACCESS.2024.3356568 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Directory of Open Access Journals (DOAJ) |
DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Materials Research Database |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2169-3536 |
EndPage | 39094 |
ExternalDocumentID | oai_doaj_org_article_71bcc924738a49af8d21c803cab96035 10_1109_ACCESS_2024_3356568 10410848 |
Genre | orig-research |
GrantInformation_xml | – fundername: Doctoral Startup Fund of Jiangxi University of Science and Technology grantid: 205200100402 funderid: 10.13039/501100008254 – fundername: Scientific Research Project of the Jiangxi Provincial Department of Education grantid: GJJ200839 |
GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c409t-de3f18859293b6d8963d46f51095e9bb903ec3b379ae90236a34b37d1488c5103 |
IEDL.DBID | RIE |
ISSN | 2169-3536 |
IngestDate | Wed Aug 27 01:31:36 EDT 2025 Sun Jun 29 12:55:41 EDT 2025 Thu Apr 24 22:50:42 EDT 2025 Tue Jul 01 04:14:18 EDT 2025 Wed Aug 27 02:17:02 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://creativecommons.org/licenses/by-nc-nd/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c409t-de3f18859293b6d8963d46f51095e9bb903ec3b379ae90236a34b37d1488c5103 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-6960-5567 0009-0000-4320-0262 0009-0003-0756-794X |
OpenAccessLink | https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10410848 |
PQID | 2969055334 |
PQPubID | 4845423 |
PageCount | 14 |
ParticipantIDs | crossref_citationtrail_10_1109_ACCESS_2024_3356568 ieee_primary_10410848 doaj_primary_oai_doaj_org_article_71bcc924738a49af8d21c803cab96035 proquest_journals_2969055334 crossref_primary_10_1109_ACCESS_2024_3356568 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20240000 2024-00-00 20240101 2024-01-01 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – year: 2024 text: 20240000 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE access |
PublicationTitleAbbrev | Access |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 ref14 Li (ref17) 2018 ref11 ref10 Szegedy (ref39) 2013 Rejimoan (ref2) ref16 Zheng (ref47) 2023 ref19 Devlin (ref31) 2018 ref18 Zhu (ref42) 2021 Zhang (ref51); 28 Goyal (ref21) 2022 ref45 ref48 Ebrahimi (ref30) 2017 ref41 ref44 ref8 Wu (ref43) 2023 ref7 Maas (ref49) ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref34 ref37 ref36 ref33 ref32 ref1 ref38 Goodfellow (ref22) 2014 ref24 ref26 ref20 ref28 ref27 Gowal (ref23) 2018 ref29 Koubaa (ref46) 2023 Yoo (ref25) 2022 Socher (ref50) |
References_xml | – ident: ref34 doi: 10.1109/msp.2017.2765202 – ident: ref4 doi: 10.1016/j.compedu.2022.104649 – ident: ref14 doi: 10.1155/2021/4907754 – start-page: 142 volume-title: Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Human Lang. Technol. ident: ref49 article-title: Learning word vectors for sentiment analysis – ident: ref41 doi: 10.18653/v1/P19-1561 – start-page: 1 volume-title: Proc. 2nd IEEE Delhi Sect. Flagship Conf. (DELCON) ident: ref2 article-title: A comprehensive review on deep learning approaches for question answering and machine reading comprehension in NLP – ident: ref40 doi: 10.1609/aaai.v31i1.10970 – ident: ref32 doi: 10.18653/v1/P19-1103 – year: 2022 ident: ref25 article-title: Detection of word adversarial examples in text classification: Benchmark and baseline via robust density estimation publication-title: arXiv:2203.01677 – ident: ref37 doi: 10.18653/v1/D19-1423 – ident: ref35 doi: 10.18653/v1/D19-1554 – ident: ref9 doi: 10.1016/j.eswa.2023.119658 – ident: ref24 doi: 10.18653/v1/2023.findings-acl.857 – ident: ref36 doi: 10.1145/3397271.3401209 – year: 2022 ident: ref21 article-title: A survey of adversarial defences and robustness in NLP publication-title: arXiv:2203.06414 – start-page: 1631 volume-title: Proc. Conf. Empirical Methods Natural Lang. Process. ident: ref50 article-title: Recursive deep models for semantic compositionality over a sentiment treebank – ident: ref3 doi: 10.3390/su14137598 – ident: ref7 doi: 10.1007/s10115-022-01744-y – ident: ref5 doi: 10.1007/s10639-022-11194-2 – ident: ref1 doi: 10.1080/09588221.2021.1939388 – year: 2018 ident: ref17 article-title: TextBugger: Generating adversarial text against real-world applications publication-title: arXiv:1812.05271 – ident: ref6 doi: 10.1145/3490238 – ident: ref19 doi: 10.1007/978-3-030-55393-7_28 – year: 2023 ident: ref43 article-title: A comparative study of open-source large language models, GPT-4 and claude 2: Multiple-choice test taking in nephrology publication-title: arXiv:2308.04709 – year: 2023 ident: ref46 article-title: GPT-4 vs. GPT-3.5: A concise showdown – ident: ref12 doi: 10.18653/v1/2020.acl-main.590 – ident: ref28 doi: 10.1145/3544548.3581388 – volume: 28 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref51 article-title: Character-level convolutional networks for text classification – year: 2014 ident: ref22 article-title: Explaining and harnessing adversarial examples publication-title: arXiv:1412.6572 – ident: ref8 doi: 10.3390/info13020083 – ident: ref45 doi: 10.48550/ARXIV.1907.11692 – ident: ref33 doi: 10.1162/tacl_a_00318 – year: 2017 ident: ref30 article-title: HotFlip: White-box adversarial examples for text classification publication-title: arXiv:1712.06751 – ident: ref11 doi: 10.3390/electronics11162483 – ident: ref20 doi: 10.1093/nsr/nwx106 – ident: ref38 doi: 10.1145/3460120.3484538 – year: 2021 ident: ref42 article-title: TREATED: Towards universal defense against textual adversarial attacks publication-title: arXiv:2109.06176 – year: 2023 ident: ref47 article-title: Judging LLM-as-a-judge with MT-bench and chatbot arena publication-title: arXiv:2306.05685 – ident: ref16 doi: 10.1109/SPW.2018.00016 – ident: ref48 doi: 10.7759/cureus.46222 – ident: ref10 doi: 10.1155/2022/4914665 – ident: ref26 doi: 10.18653/v1/2021.findings-acl.141 – ident: ref18 doi: 10.1609/aaai.v34i05.6311 – year: 2018 ident: ref31 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding publication-title: arXiv:1810.04805 – ident: ref15 doi: 10.1016/j.asoc.2021.108383 – year: 2018 ident: ref23 article-title: On the effectiveness of interval bound propagation for training verifiably robust models publication-title: arXiv:1810.12715 – ident: ref27 doi: 10.1016/j.lindif.2023.102274 – ident: ref44 doi: 10.1007/s00259-023-06172-w – ident: ref29 doi: 10.18653/v1/2023.findings-acl.611 – year: 2013 ident: ref39 article-title: Intriguing properties of neural networks publication-title: arXiv:1312.6199 – ident: ref13 doi: 10.1016/j.neucom.2022.04.020 |
SSID | ssj0000816957 |
Score | 2.3381865 |
Snippet | Natural language processing (NLP) models find extensive applications but face vulnerabilities against adversarial inputs. Traditional defenses lean heavily on... |
SourceID | doaj proquest crossref ieee |
SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 39081 |
SubjectTerms | Adversarial defense Adversarial machine learning Computational modeling Data models Detection algorithms large language model Large language models model security Natural language processing Performance standards Perturbation methods prompt engineering Restoration Robustness Semantics Sentences Training |
SummonAdditionalLinks | – databaseName: Directory of Open Access Journals (DOAJ) dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELYQEwyIRxGFgjwwEhrHiWuzlZaqQsDQUqnqYsWPiAGlqA3_n7PjVkFIsLAl0SWO786-O8v-PoSureaxYQmPLBMmSqkxUa5hMoRgoEimaFjQf35h41n6OM_mDaovtyeshgeuFdftEaU1FAk9yvNU5AU3CYHvU50rSL6pRy-FmNcopvwczAkTWS_ADJFYdPuDAfQICsIkvaXUpTH8WyjyiP2BYuXHvOyDzegQHYQsEffrvztCO7Y8RvsN7MAT9LQYDid3uI8XdrWMpm_LCg9t4YnhMCSi2DMtr3PnX3iaOwzgNQhUfudVifPS4IknlfGWaaHZ6OF1MI4CNUKkoSCrImNpQTjPILmhihkOw8ikrIABJjIrlBIxtZoq2nPY2w4kPqcp3Bkofrh2IHqnaLdclvYMYbAMRG1TKMZ0at05VpDhmii4jJUlbZRstCR1wA139BXv0tcPsZC1aqVTrQyqbaOb7UsfNWzG7-L3Tv1bUYd57R-AJ8jgCfIvT2ijljNeo72UOLaANupsrCnDAF3LRDARZ-4c8vl_tH2B9lx_6rWZDtqtVp_2ErKVSl15x_wCACbf7Q priority: 102 providerName: Directory of Open Access Journals |
Title | ZDDR: A Zero-Shot Defender for Adversarial Samples Detection and Restoration |
URI | https://ieeexplore.ieee.org/document/10410848 https://www.proquest.com/docview/2969055334 https://doaj.org/article/71bcc924738a49af8d21c803cab96035 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9swDCbWnNbD-hyWLi106HFObct2pN6ypEFQdDn0ARS9CNbDGLDBKRrn0l8_UlaCrEWL3mRDgmV_pETS4keAU2dEbItURK6QNsq4tVFpcDHEzUAnueYhoP9rVkzvssv7_D4kq_tcGOecP3zm-tT0__Lt3CwpVIYaniXE_74FW-i5tcla64AKVZCQ-SAwCyWxPBuORvgS6AOmWZ9zslzEf7uPJ-kPVVVeLcV-f5nswGw1s_ZYyZ_-stF98_yCtPHDU9-FL8HSZMNWNPbgk6v3YXuDf_AArh7G4-tzNmQP7mke3fyeN2zsKl9cjqExy3y15kVJMspuSuIRXmCHxp_eqllZW3btC9N4dA_hbnJxO5pGobxCZNCpayLreJUIkaOBxHVhBaqizYoKlVTmTmotY-4M13xA_N1ENF_yDK8sOlDCEBHfV-jU89p9A4bo4s5vK10UJnOUC4t9hEk0NmPtki6kq8-uTOAepxIYf5X3QWKpWqwUYaUCVl34sR702FJvvN_9J-G57kq82f4G4qCCGqpBoo1BlxOnW2ayrIRNE5RWbkqNrhzPu3BI2G08r4WtC72VeKig5AuVykLGOeUyH70x7Dt8pim2IZsedJqnpTtGI6bRJ975P_Ei_A-syOsu |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwEB6VcgAO5VXULS34ADeyJHGStStx2O5Sbem2hz6kqhcTP6JKoCzqZlWV_9K_0t_WGce7KiC4VeKWRLbj2J_HM87MNwDvnBGxLVIRuULaKOPWRqVBYYibgU5yzcOB_v5BMTrJvpzmp0twvYiFcc555zPXpUv_L99OzIyOynCFZwnxvwcfyj13dYkW2vTT7hCn832a7nw-HoyikEQgMmi6NJF1vEqEyFEN4LqwAgFns6JCKMrcSa1lzJ3hmveIpZro1Eue4Z1FM0EYopvDdh_AQ1Q08rQND1sc4VDOCpn3ApcRNvixPxjgsKHVmWZdzklXEr_sdz4tQMjj8ofw9zvazlO4mY9F68jyrTtrdNf8_I0m8r8drGewEnRp1m_B_xyWXP0CntxhWHwJ47Ph8HCL9dmZu5hER-eThg1d5dPnMVTXmc9HPS1pFbKjkpiSp1ig8f5pNStryw596h2P31U4uZfPeQXL9aR2a8AQv6jb2EoXhckcRftiGWESjZexdkkH0vk0KxPY1SnJx3flraxYqhYbirChAjY68GFR6UdLLvLv4tuEn0VRYgb3D3DeVRA0qpdoY9Coxu6WmSwrYdME1yM3pUZjlecdWCWs3HlfC5MObMzhqIIYm6pUFjLOKVp7_S_V3sKj0fH-WI13D_Zew2PqbntAtQHLzcXMbaLK1ug3fuEw-Hrf4LsF_StE6A |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ZDDR%3A+A+Zero-Shot+Defender+for+Adversarial+Samples+Detection+and+Restoration&rft.jtitle=IEEE+access&rft.au=Chen%2C+Musheng&rft.au=He%2C+Guowei&rft.au=Wu%2C+Junhua&rft.date=2024&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=12&rft.spage=39081&rft.epage=39094&rft_id=info:doi/10.1109%2FACCESS.2024.3356568&rft.externalDocID=10410848 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |