Robust transformer with locality inductive bias and feature normalization

Vision transformers have been demonstrated to yield state-of-the-art results on a variety of computer vision tasks using attention-based networks. However, research works in transformers mostly do not investigate robustness/accuracy trade-off, and they still struggle to handle adversarial perturbati...

Full description

Saved in:

Bibliographic Details
Published in	Engineering science and technology, an international journal Vol. 38; p. 101320
Main Authors	Manzari, Omid Nejati, Kashiani, Hossein, Dehkordi, Hojat Asgarian, Shokouhi, Shahriar B.
Format	Journal Article
Language	English
Published	Elsevier B.V 01.02.2023 Elsevier
Subjects	Adversarial attacks Robustness Traffic sign classification Vision transformer Vision transformer Traffic sign classification Robustness Adversarial attacks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Vision transformers have been demonstrated to yield state-of-the-art results on a variety of computer vision tasks using attention-based networks. However, research works in transformers mostly do not investigate robustness/accuracy trade-off, and they still struggle to handle adversarial perturbations. In this paper, we explore the robustness of vision transformers against adversarial perturbations and try to enhance their robustness/accuracy trade-off in white box attack settings. To this end, we propose Locality iN Locality (LNL) transformer model. We prove that the locality introduction to LNL contributes to the robustness performance since it aggregates local information such as lines, edges, shapes, and even objects. In addition, to further improve the robustness performance, we encourage LNL to extract training signal from the moments (a.k.a., mean and standard deviation) and the normalized features. We validate the effectiveness and generality of LNL by achieving state-of-the-art results in terms of accuracy and robustness metrics on German Traffic Sign Recognition Benchmark (GTSRB) and Canadian Institute for Advanced Research (CIFAR-10). More specifically, for traffic sign classification, the proposed LNL yields gains of 1.1% and 35% in terms of clean and robustness accuracy compared to the state-of-the-art studies.
ISSN:	2215-0986 2215-0986
DOI:	10.1016/j.jestch.2022.101320