Deep Text Classification Can be Fooled

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are importa...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Liang, Bin, Li, Hongcheng, Su, Miaoqiang, Bian, Pan, Li, Xirong, Shi, Wenchang
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 07.01.2019
Subjects	Classification Classifiers Computer Science - Cryptography and Security Computer Science - Learning Computing costs Design modifications Neural networks Perturbation State of the art Utilities
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are important for classification are identified by computing the cost gradients of the input (white-box attack) or generating a series of occluded test samples (black-box attack). Based on these items, we design three perturbation strategies, namely insertion, modification, and removal, to generate adversarial samples. The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The adversarial samples can be perturbed to any desirable classes without compromising their utilities. At the same time, the introduced perturbation is difficult to be perceived.
ISSN:	2331-8422
DOI:	10.48550/arxiv.1704.08006