Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated, presenting serious security threats to real-worl...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
11.06.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent studies show that neural natural language processing (NLP) models are
vulnerable to backdoor attacks. Injected with backdoors, models perform
normally on benign examples but produce attacker-specified predictions when the
backdoor is activated, presenting serious security threats to real-world
applications. Since existing textual backdoor attacks pay little attention to
the invisibility of backdoors, they can be easily detected and blocked. In this
work, we present invisible backdoors that are activated by a learnable
combination of word substitution. We show that NLP models can be injected with
backdoors that lead to a nearly 100% attack success rate, whereas being highly
invisible to existing defense strategies and even human inspections. The
results raise a serious alarm to the security of NLP models, which requires
further research to be resolved. All the data and code of this paper are
released at https://github.com/thunlp/BkdAtk-LWS. |
---|---|
DOI: | 10.48550/arxiv.2106.06361 |