Multi-directional guidance network for fine-grained visual classification

Fine-grained images have a high confusion among subclasses. The key to this is finding discriminative regions that can be used for classification. The existing methods mainly use attention mechanisms or high-level linguistic information for classification, which only focus on the feature regions wit...

Full description

Saved in:

Bibliographic Details
Published in	The Visual computer Vol. 40; no. 11; pp. 8113 - 8124
Main Authors	Yang, Shengying, Jin, Yao, Lei, Jingsheng, Zhang, Shuping
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2024 Springer Nature B.V
Subjects	Accuracy Aircraft performance Artificial Intelligence Classification Computer Graphics Computer Science Computer vision Image Processing and Computer Vision Localization Methods Modules Multilayers Original Article Representations Semantics Visual discrimination Fine-grained visual classification Attention mechanism Multi-layer interaction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Fine-grained images have a high confusion among subclasses. The key to this is finding discriminative regions that can be used for classification. The existing methods mainly use attention mechanisms or high-level linguistic information for classification, which only focus on the feature regions with the highest response and neglect other parts, resulting in inadequate capability for feature representation. Classification based on only a single feature part is not reliable. The fusion mechanism can achieve locating several different parts. However, simple feature fusion strategies do not exploit cross-layer information and lack the use of low-level information. To effectively address this limitation, we propose the multi-directional guidance network. Our network starts with a feature and attention guidance module that forces the network to learn detailed feature representations. Second, we propose a multi-layer guidance module that integrates diverse semantic information. In addition, we introduce a multi-way transfer structure to fuse low-level and high-level semantics in a novel way to improve generalization ability of the network. We have conducted extensive experiments on the FGVC benchmark dataset (CUB-200-2011, Stanford Cars and FGVC Aircraft) to demonstrate the superior performance of the method. Our code will be available at https://github.com/syyang2022/MGN .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-023-03226-w