CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
This study investigates the ability of various vision-language (VL) models to ground context-dependent and non-context-dependent verb phrases. To do that, we introduce the CV-Probes dataset, designed explicitly for studying context understanding, containing image-caption pairs with context-dependent...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
02.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!