Rethinking Model Ensemble in Transfer-based Adversarial Attacks
It is widely recognized that deep learning models lack robustness to adversarial examples. An intriguing property of adversarial examples is that they can transfer across different models, which enables black-box attacks without any knowledge of the victim model. An effective strategy to improve the...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
16.03.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | It is widely recognized that deep learning models lack robustness to
adversarial examples. An intriguing property of adversarial examples is that
they can transfer across different models, which enables black-box attacks
without any knowledge of the victim model. An effective strategy to improve the
transferability is attacking an ensemble of models. However, previous works
simply average the outputs of different models, lacking an in-depth analysis on
how and why model ensemble methods can strongly improve the transferability. In
this paper, we rethink the ensemble in adversarial attacks and define the
common weakness of model ensemble with two properties: 1) the flatness of loss
landscape; and 2) the closeness to the local optimum of each model. We
empirically and theoretically show that both properties are strongly correlated
with the transferability and propose a Common Weakness Attack (CWA) to generate
more transferable adversarial examples by promoting these two properties.
Experimental results on both image classification and object detection tasks
validate the effectiveness of our approach to improving the adversarial
transferability, especially when attacking adversarially trained models. We
also successfully apply our method to attack a black-box large vision-language
model -- Google's Bard, showing the practical effectiveness. Code is available
at \url{https://github.com/huanranchen/AdversarialAttacks}. |
---|---|
DOI: | 10.48550/arxiv.2303.09105 |