Strengthen contrastive semantic consistency for fine-grained image classification

Fine-grained Visual Classification (FGVC) refers to the divisions of sub-classes from the given object categories only with the supervision of the image-level labels. While numerous efforts have improved the recognition accuracy of fine-grained images by strengthening the discriminative feature betw...

Full description

Saved in:

Bibliographic Details
Published in	Pattern analysis and applications : PAA Vol. 28; no. 2
Main Authors	Wang, Yupeng, Wang, Yongli, Ye, Qiaolin, Lang, Wenxi, Xu, Can
Format	Journal Article
Language	English
Published	London Springer London 01.06.2025 Springer Nature B.V
Subjects	Classification Computer Science Image classification Original Article Pattern Recognition Semantics Visual discrimination Data augmentation Fine-grained visual classification Semantic consistency Discriminative representation learning Contrastive learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Fine-grained Visual Classification (FGVC) refers to the divisions of sub-classes from the given object categories only with the supervision of the image-level labels. While numerous efforts have improved the recognition accuracy of fine-grained images by strengthening the discriminative feature between different subtle inter-classes, we argue most recent approaches still suffer from the challenge of high intra-class variances in FGVC, i.e., objects belonging to the same sub-class present huge visual differences to output different recognition results. To suppress the intra-class variances, in this paper, we capture the semantic consistency for the visual changes of intra-class images and propose a novel contrastive fine-grained visual classification network (CFGVC-Net). We first embed discriminative parts to distinguish different sub-classes based on the spatial attention map. We then design the semantic consistency enhancement module by applying several transformation strategies to the training images and further matching the discriminative features of the generated image pairs based on the center loss and contrast loss, which can improve the model learning tolerance to the distribution diversity of the intra-class images. Extensive experiments on five benchmark datasets show that the proposed CFGVC-Net can significantly enhance FGVC performance, demonstrating its effectiveness across diverse fine-grained classification tasks. The code is available at https://github.com/WangYuPeng1/CFGVC-Net .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-025-01456-3