Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics
Deep-learning models have demonstrated remarkable performance in a variety of fields, owing to advancements in computational power and the availability of extensive datasets for training large-scale models. Nonetheless, these models inherently possess a vulnerability wherein even small alterations t...
Saved in:
Published in | Computer vision and image understanding Vol. 235; p. 103800 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep-learning models have demonstrated remarkable performance in a variety of fields, owing to advancements in computational power and the availability of extensive datasets for training large-scale models. Nonetheless, these models inherently possess a vulnerability wherein even small alterations to the input can lead to substantially different outputs. Consequently, it is imperative to assess the robustness of deep-learning models prior to relying on their decision-making capabilities. In this study, we investigate the adversarial robustness of convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid CNNs +ViTs, which represent prevalent architectures in computer vision. Our evaluation is grounded on four novel model-sensitivity metrics that we introduce. These metrics are evaluated in the context of random noise and gradient-based adversarial perturbations. To ensure a fair comparison, we employ models with comparable capacities within each group and conduct experiments separately, utilizing ImageNet-1K and ImageNet-21K as pretraining data. Our fair experimental results provide empirical evidence that ViT-based models exhibit higher adversarial robustness than CNN-based counterparts, helping to dispel doubts about the findings of prior studies. Additionally, we introduce novel metrics that contribute new insights into the previously unconfirmed characteristics of these models.
•We compare the adversarial robustness of ViT- and CNN-based models.•Our experiment is rigorous and unbiased, in contrast to previous studies.•We propose novel sensitivity-based metrics for evaluating adversarial robustness.•The results indicate that CNNs are more sensitive to perturbation than ViTs. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2023.103800 |