Class Discriminative Universal Adversarial Attack for Text Classification
The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the def...
Saved in:
Published in | Ji suan ji ke xue Vol. 49; no. 8; pp. 323 - 329 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | Chinese |
Published |
Chongqing
Guojia Kexue Jishu Bu
01.08.2022
Editorial office of Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the defense system.For more stealth attack, a simple and efficient class discriminative universal adversarial attack method is proposed, which has an obvious attack effect on textual examples from the targeted classes and limited influence on the non-targeted classes.In the case of white-box attack, multiple candidate perturbation sequences are searched by using the average gradient of the perturbation sequence in each batch.The perturbation sequence with the smallest loss is selected for the next iteration until no new perturbation sequence is generated.Comprehensive experiments are conducted on four public Chinese and English datasets and TextCNN,BiLSTM to evaluate the effectiveness of the proposed method.Experimental r |
---|---|
ISSN: | 1002-137X |
DOI: | 10.11896/jsjkx.220200077 |