Dual-path Rare Content Enhancement Network for Image and Text Matching

Image and text matching plays a crucial role in bridging the cross-modal gap between vision and language, and has achieved great progress due to the deep learning. However, the existing methods still suffer from the long-tail problem, where only a small proportion contains highly frequent semantics...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 33; no. 10; p. 1
Main Authors Wang, Yan, Su, Yuting, Li, Wenhui, Xiao, Jun, Li, Xuanya, Liu, An-An
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Image and text matching plays a crucial role in bridging the cross-modal gap between vision and language, and has achieved great progress due to the deep learning. However, the existing methods still suffer from the long-tail problem, where only a small proportion contains highly frequent semantics and a long tail proportion is constructed by rare semantics. In this paper, we propose a novel Dual-path Rare Content Enhancement Network (DRCE) to tackle the long-tail issue. Specifically, the Cross-modal Representation Enhancement (CRE) and Cross-modal Association Enhancement (CAE) are proposed to construct dual-path structure to enhance rare content representation and association with the benefit of cross-modal prior knowledge. This structure can effectively exploit the complementary cross-modal relation from different aspects and fuse these information in an adaptively manner by the proposed Adaptive Fusion Strategy (AFS). Moreover, we also propose an alternative re-ranking strategy (ARR) to explore the reciprocal contextual information to refine image-text matching results, which can further suppress the negative effect of long-tail effect. Extensive experiments on two large-scale datasets show the significant improvements and validate the superiority of our method.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2023.3254530