A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification

The fine-grained visual classification (FGVC) problem is to classify different subclasses in same superclass. Due to the similarity between subclasses, the problem requires capturing fine-grained discriminative features. Although current approaches are able to extract more fine-grained features by d...

Full description

Saved in:
Bibliographic Details
Published in2021 7th International Conference on Computer and Communications (ICCC) pp. 863 - 867
Main Authors Cai, Changli, Zhang, Tiankui, Weng, Zhewei, Feng, Chunyan, Wang, Yapeng
Format Conference Proceeding
LanguageEnglish
Published IEEE 10.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The fine-grained visual classification (FGVC) problem is to classify different subclasses in same superclass. Due to the similarity between subclasses, the problem requires capturing fine-grained discriminative features. Although current approaches are able to extract more fine-grained features by designing complex feature extraction modules, the excessive focus on discriminative features results in ignoring massive global feature information and reducing the ability of resisting background noise. This paper propose a transformer architecture based on vision transformer (ViT) with adaptive attention (TransAA). To optimize the attention of ViT, we design two modules. An attention-weakening module is designed to enforce the model to capture more feature information, and an attention-enhancement module is designed to enhance the extraction ability of the critical features. Otherwise, we introduce a sample weighting loss function in the training process to adaptively adjust both weakening and enhancement processes. The performance of the TransAA is demonstrated on three benchmark fine-grained datasets.
DOI:10.1109/ICCC54389.2021.9674560