CLIP-VG: Self-Paced Curriculum Adapting of CLIP for Visual Grounding
Visual Grounding (VG) is a crucial topic in the field of vision and language, which involves locating a specific region described by expressions within an image. To reduce the reliance on manually labeled data, unsupervised methods have been developed to locate regions using pseudo-labels. However,...
Saved in:
Published in | IEEE transactions on multimedia Vol. 26; pp. 4334 - 4347 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!