TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

Designing a query-efficient attack strategy to generate high-quality adversarial examples under the hard-label black-box setting is a fundamental yet challenging problem, especially in natural language processing (NLP). The process of searching for adversarial examples has many uncertainties (e.g.,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on dependable and secure computing Vol. 21; no. 4; pp. 3901 - 3916
Main Authors	Peng, Hao, Guo, Shixin, Zhao, Dandan, Zhang, Xuhong, Han, Jianmin, Ji, Shouling, Yang, Xing, Zhong, Ming
Format	Journal Article
Language	English
Published	Washington IEEE 01.07.2024 IEEE Computer Society
Subjects	Adversarial examples Data mining deep learning security Heuristic methods Internet Labels Natural language processing Optimization Perturbation methods Queries Query processing Search methods Search problems Search process Semantics Sentiment analysis Tabu search Task analysis textual adversarial attack
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Designing a query-efficient attack strategy to generate high-quality adversarial examples under the hard-label black-box setting is a fundamental yet challenging problem, especially in natural language processing (NLP). The process of searching for adversarial examples has many uncertainties (e.g., an unknown impact on the target model's prediction of the added perturbation) when confidence scores cannot be accessed, which must be compensated for with a large number of queries. To address this issue, we propose TextCheater, a decision-based metaheuristic search method that performs a query-efficient textual adversarial attack task by prohibiting invalid searches. The strategies of multiple initialization points and Tabu search are also introduced to keep the search process from falling into a local optimum. We apply our approach to three state-of-the-art language models (i.e., BERT, wordLSTM, and wordCNN) across six benchmark datasets and eight real-world commercial sentiment analysis platforms/models. Furthermore, we evaluate the Robustly optimized BERT pretraining Approach (RoBERTa) and models that enhance their robustness by adversarial training on toxicity detection and text classification tasks. The results demonstrate that our method minimizes the number of queries required for crafting plausible adversarial text while outperforming existing attack methods in the attack success rate, fluency of output sentences, and similarity between the original text and its adversary.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1545-5971 1941-0018
DOI:	10.1109/TDSC.2023.3339802