A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification
The fine-grained visual classification (FGVC) problem is to classify different subclasses in same superclass. Due to the similarity between subclasses, the problem requires capturing fine-grained discriminative features. Although current approaches are able to extract more fine-grained features by d...
Saved in:
Published in | 2021 7th International Conference on Computer and Communications (ICCC) pp. 863 - 867 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
10.12.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The fine-grained visual classification (FGVC) problem is to classify different subclasses in same superclass. Due to the similarity between subclasses, the problem requires capturing fine-grained discriminative features. Although current approaches are able to extract more fine-grained features by designing complex feature extraction modules, the excessive focus on discriminative features results in ignoring massive global feature information and reducing the ability of resisting background noise. This paper propose a transformer architecture based on vision transformer (ViT) with adaptive attention (TransAA). To optimize the attention of ViT, we design two modules. An attention-weakening module is designed to enforce the model to capture more feature information, and an attention-enhancement module is designed to enhance the extraction ability of the critical features. Otherwise, we introduce a sample weighting loss function in the training process to adaptively adjust both weakening and enhancement processes. The performance of the TransAA is demonstrated on three benchmark fine-grained datasets. |
---|---|
DOI: | 10.1109/ICCC54389.2021.9674560 |