Exploring the differences in adversarial robustness between ViT- and CNN-based models using novel metrics

Deep-learning models have demonstrated remarkable performance in a variety of fields, owing to advancements in computational power and the availability of extensive datasets for training large-scale models. Nonetheless, these models inherently possess a vulnerability wherein even small alterations t...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 235; p. 103800
Main Authors	Heo, Jaehyuk, Seo, Seungwan, Kang, Pilsung
Format	Journal Article
Language	English
Published	Elsevier Inc 01.10.2023
Subjects	Adversarial robustness Computer vision 65D05 Computer vision 65D17 Adversarial robustness 41A05 41A10
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep-learning models have demonstrated remarkable performance in a variety of fields, owing to advancements in computational power and the availability of extensive datasets for training large-scale models. Nonetheless, these models inherently possess a vulnerability wherein even small alterations to the input can lead to substantially different outputs. Consequently, it is imperative to assess the robustness of deep-learning models prior to relying on their decision-making capabilities. In this study, we investigate the adversarial robustness of convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid CNNs ＋ViTs, which represent prevalent architectures in computer vision. Our evaluation is grounded on four novel model-sensitivity metrics that we introduce. These metrics are evaluated in the context of random noise and gradient-based adversarial perturbations. To ensure a fair comparison, we employ models with comparable capacities within each group and conduct experiments separately, utilizing ImageNet-1K and ImageNet-21K as pretraining data. Our fair experimental results provide empirical evidence that ViT-based models exhibit higher adversarial robustness than CNN-based counterparts, helping to dispel doubts about the findings of prior studies. Additionally, we introduce novel metrics that contribute new insights into the previously unconfirmed characteristics of these models. •We compare the adversarial robustness of ViT- and CNN-based models.•Our experiment is rigorous and unbiased, in contrast to previous studies.•We propose novel sensitivity-based metrics for evaluating adversarial robustness.•The results indicate that CNNs are more sensitive to perturbation than ViTs.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2023.103800