Adversarial Attack against Deep Saliency Models Powered by Non-Redundant Priors
Saliency detection is an effective front-end process to many security-related tasks, e.g. automatic drive and tracking. Adversarial attack serves as an efficient surrogate to evaluate the robustness of deep saliency models before they are deployed in real world. However, most of current adversarial...
Saved in:
Published in | IEEE transactions on image processing Vol. 30; p. 1 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.01.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Saliency detection is an effective front-end process to many security-related tasks, e.g. automatic drive and tracking. Adversarial attack serves as an efficient surrogate to evaluate the robustness of deep saliency models before they are deployed in real world. However, most of current adversarial attacks exploit the gradients spanning the entire image space to craft adversarial examples, ignoring the fact that natural images are high-dimensional and spatially over-redundant, thus causing expensive attack cost and poor perceptibility. To circumvent these issues, this paper builds an efficient bridge between the accessible partially-white-box source models and the unknown black-box target models. The proposed method includes two steps: 1) We design a new partially-white-box attack, which defines the cost function in the compact hidden space to punish a fraction of feature activations corresponding to the salient regions, instead of punishing every pixel spanning the entire dense output space. This partially-white-box attack reduces the redundancy of the adversarial perturbation. 2) We exploit the non-redundant perturbations from some source models as the prior cues, and use an iterative zeroth-order optimizer to compute the directional derivatives along the non-redundant prior directions, in order to estimate the actual gradient of the black-box target model. The non-redundant priors boost the update of some "critical" pixels locating at non-zero coordinates of the prior cues, while keeping other redundant pixels locating at the zero coordinates unaffected. Our method achieves the best tradeoff between attack ability and perturbation redundancy. Finally, we conduct a comprehensive experiment to test the robustness of 18 state-of-the-art deep saliency models against 16 malicious attacks, under both of white-box and black-box settings, which contributes a new robustness benchmark to the saliency community for the first time. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1057-7149 1941-0042 |
DOI: | 10.1109/TIP.2021.3050303 |