Diffusion Models are Certifiably Robust Classifiers
Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrat...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Generative learning, recognized for its effective modeling of data
distributions, offers inherent advantages in handling out-of-distribution
instances, especially for enhancing robustness to adversarial attacks. Among
these, diffusion classifiers, utilizing powerful diffusion models, have
demonstrated superior empirical robustness. However, a comprehensive
theoretical understanding of their robustness is still lacking, raising
concerns about their vulnerability to stronger future attacks. In this study,
we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish
their certified robustness, demonstrating their inherent resilience. To achieve
non-constant Lipschitzness, thereby obtaining much tighter certified
robustness, we generalize diffusion classifiers to classify Gaussian-corrupted
data. This involves deriving the evidence lower bounds (ELBOs) for these
distributions, approximating the likelihood using the ELBO, and calculating
classification probabilities via Bayes' theorem. Experimental results show the
superior certified robustness of these Noised Diffusion Classifiers (NDCs).
Notably, we achieve over 80% and 70% certified robustness on CIFAR-10 under
adversarial perturbations with \(\ell_2\) norms less than 0.25 and 0.5,
respectively, using a single off-the-shelf diffusion model without any
additional data. |
---|---|
DOI: | 10.48550/arxiv.2402.02316 |