Class Discriminative Universal Adversarial Attack for Text Classification

The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the def...

Full description

Saved in:
Bibliographic Details
Published inJi suan ji ke xue Vol. 49; no. 8; pp. 323 - 329
Main Authors Hao, Zhi-rong, Chen, Long, Huang, Jia-cheng
Format Journal Article
LanguageChinese
Published Chongqing Guojia Kexue Jishu Bu 01.08.2022
Editorial office of Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the defense system.For more stealth attack, a simple and efficient class discriminative universal adversarial attack method is proposed, which has an obvious attack effect on textual examples from the targeted classes and limited influence on the non-targeted classes.In the case of white-box attack, multiple candidate perturbation sequences are searched by using the average gradient of the perturbation sequence in each batch.The perturbation sequence with the smallest loss is selected for the next iteration until no new perturbation sequence is generated.Comprehensive experiments are conducted on four public Chinese and English datasets and TextCNN,BiLSTM to evaluate the effectiveness of the proposed method.Experimental r
ISSN:1002-137X
DOI:10.11896/jsjkx.220200077