MedViT: A robust vision transformer for generalized medical image classification
Convolutional Neural Networks (CNNs) have advanced existing medical systems for automatic disease diagnosis. However, there are still concerns about the reliability of deep medical diagnosis systems against the potential threats of adversarial attacks since inaccurate diagnosis could lead to disastr...
Saved in:
Published in | Computers in biology and medicine Vol. 157; p. 106791 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Ltd
01.05.2023
Elsevier Limited |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Convolutional Neural Networks (CNNs) have advanced existing medical systems for automatic disease diagnosis. However, there are still concerns about the reliability of deep medical diagnosis systems against the potential threats of adversarial attacks since inaccurate diagnosis could lead to disastrous consequences in the safety realm. In this study, we propose a highly robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs as well as the global connectivity of vision Transformers. To mitigate the high quadratic complexity of the self-attention mechanism while jointly attending to information in various representation subspaces, we construct our attention mechanism by means of an efficient convolution operation. Moreover, to alleviate the fragility of our Transformer model against adversarial attacks, we attempt to learn smoother decision boundaries. To this end, we augment the shape information of an image in the high-level feature space by permuting the feature mean and variance within mini-batches. With less computational complexity, our proposed hybrid model demonstrates its high robustness and generalization ability compared to the state-of-the-art studies on a large-scale collection of standardized MedMNIST-2D datasets.
•We propose a highly robust and efficient hybrid model for medical classification task.•We propose MedViT by using and combining robust components as building blocks.•A new data augmentation is proposed to blend feature normalization with data augmentation at training.•Experimental results on 12 medical datasets show that MedViT generally yields the best accuracy/robustness tradeoff. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0010-4825 1879-0534 |
DOI: | 10.1016/j.compbiomed.2023.106791 |