CLIP-VG: Self-Paced Curriculum Adapting of CLIP for Visual Grounding

Visual Grounding (VG) is a crucial topic in the field of vision and language, which involves locating a specific region described by expressions within an image. To reduce the reliance on manually labeled data, unsupervised methods have been developed to locate regions using pseudo-labels. However,...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 26; pp. 4334 - 4347
Main Authors Xiao, Linhui, Yang, Xiaoshan, Peng, Fang, Yan, Ming, Wang, Yaowei, Xu, Changsheng
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…