Class Discriminative Universal Adversarial Attack for Text Classification

The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the def...

Full description

Saved in:

Bibliographic Details
Published in	Ji suan ji ke xue Vol. 49; no. 8; pp. 323 - 329
Main Authors	Hao, Zhi-rong, Chen, Long, Huang, Jia-cheng
Format	Journal Article
Language	Chinese
Published	Chongqing Guojia Kexue Jishu Bu 01.08.2022 Editorial office of Computer Science
Subjects	Perturbation Stealth technology universal adversarial attack\|text classification\|class discriminative\|deep learning\|neural networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the defense system.For more stealth attack, a simple and efficient class discriminative universal adversarial attack method is proposed, which has an obvious attack effect on textual examples from the targeted classes and limited influence on the non-targeted classes.In the case of white-box attack, multiple candidate perturbation sequences are searched by using the average gradient of the perturbation sequence in each batch.The perturbation sequence with the smallest loss is selected for the next iteration until no new perturbation sequence is generated.Comprehensive experiments are conducted on four public Chinese and English datasets and TextCNN,BiLSTM to evaluate the effectiveness of the proposed method.Experimental r
ISSN:	1002-137X
DOI:	10.11896/jsjkx.220200077