Generalizing universal adversarial perturbations for deep neural networks

Previous studies have shown that universal adversarial attacks can fool deep neural networks over a large set of input images with a single human-invisible perturbation. However, current methods for universal adversarial attacks are based on additive perturbation, which enables misclassification by...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 112; no. 5; pp. 1597 - 1626
Main Authors	Zhang, Yanghao, Ruan, Wenjie, Wang, Fu, Huang, Xiaowei
Format	Journal Article
Language	English
Published	New York Springer US 01.05.2023 Springer Nature B.V
Subjects	Artificial Intelligence Artificial neural networks Computer Science Computer vision Control Datasets Image classification Image segmentation Machine Learning Mechatronics Natural Language Processing (NLP) Neural networks Perturbation Robotics Semantic segmentation Simulation and Modeling Deep learning Adversarial examples Deep neural networks Security
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Previous studies have shown that universal adversarial attacks can fool deep neural networks over a large set of input images with a single human-invisible perturbation. However, current methods for universal adversarial attacks are based on additive perturbation, which enables misclassification by directly adding the perturbation on the input images. In this paper, for the first time, we show that a universal adversarial attack can also be achieved through spatial transformation (non-additive). More importantly, to unify both additive and non-additive perturbations, we propose a novel unified yet flexible framework for universal adversarial attacks, called GUAP, which can initiate attacks by ℓ ∞ -norm (additive) perturbation, spatially-transformed (non-additive) perturbation, or a combination of both. Extensive experiments are conducted on two computer vision scenarios, including image classification and semantic segmentation tasks, which contain CIFAR-10, ImageNet and Cityscapes datasets with a number of different deep neural network models, including GoogLeNet, VGG16/19, ResNet101/152, DenseNet121, and FCN-8s. Empirical experiments demonstrate that GUAP can obtain higher attack success rates on these datasets compared to state-of-the-art universal adversarial attacks. In addition, we also demonstrate how universal adversarial training benefits the robustness of the model against universal attacks. We release our tool GUAP on https://github.com/TrustAI/GUAP .
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-023-06306-z