LLaFS: When Large Language Models Meet Few-Shot Segmentation

This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowled...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3065 - 3075
Main Authors	Zhu, Lanyun, Chen, Tianrun, Ji, Deyi, Ye, Jieping, Liu, Jun
Format	Conference Proceeding
Language	English
Published	IEEE 16.06.2024
Subjects	Computer vision Few-shot segmentation Image segmentation Large language models Large vision-language models Natural language processing Pattern recognition Training Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-shot manner. To enable the text-based LLM to handle image-related tasks, we carefully design an input instruction that allows the LLM to produce segmentation results represented as polygons, and propose a region-attribute table to simulate the human visual mechanism and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pre-training to augment data and achieve better optimization. LLaFS achieves state-of-the-art results on multiple datasets, showing the potential of using LLMs for few-shot computer vision tasks.
ISSN:	2575-7075
DOI:	10.1109/CVPR52733.2024.00296