Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?

Large vision-language models (VLMs) have become state-of-the-art for many computer vision tasks, with in-context learning (ICL) as a popular adaptation strategy for new ones. But can VLMs learn novel concepts purely from visual demonstrations, or are they limited to adapting to the output format of...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Bowen, Zhao, Leo Parker Dirac, Varshavskaya, Paulina
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 25.09.2024
Subjects	Computer vision Context Learning Visual tasks
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!