Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
Today text classification models have been widely used. However, these classifiers are found to be easily fooled by adversarial examples. Fortunately, standard attacking methods generate adversarial texts in a pair-wise way, that is, an adversarial text can only be created from a real-world text by...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.03.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Today text classification models have been widely used. However, these
classifiers are found to be easily fooled by adversarial examples. Fortunately,
standard attacking methods generate adversarial texts in a pair-wise way, that
is, an adversarial text can only be created from a real-world text by replacing
a few words. In many applications, these texts are limited in numbers,
therefore their corresponding adversarial examples are often not diverse enough
and sometimes hard to read, thus can be easily detected by humans and cannot
create chaos at a large scale. In this paper, we propose an end to end solution
to efficiently generate adversarial texts from scratch using generative models,
which are not restricted to perturbing the given texts. We call it unrestricted
adversarial text generation. Specifically, we train a conditional variational
autoencoder (VAE) with an additional adversarial loss to guide the generation
of adversarial examples. Moreover, to improve the validity of adversarial
texts, we utilize discrimators and the training framework of generative
adversarial networks (GANs) to make adversarial texts consistent with real
data. Experimental results on sentiment analysis demonstrate the scalability
and efficiency of our method. It can attack text classification models with a
higher success rate than existing methods, and provide acceptable quality for
humans in the meantime. |
---|---|
DOI: | 10.48550/arxiv.2003.10388 |