Cropper: Vision-Language Model for Image Cropping through In-Context Learning

The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual i...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Lee, Seung Hyun, Ke, Junjie, Li, Yinxiao, He, Junfeng, Hickson, Steven, Datsenko, Katie, Kim, Sangpil, Ming-Hsuan Yang, Essa, Irfan, Yang, Feng
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 14.08.2024
Subjects	Aspect ratio Context Crops Free form Image enhancement Learning State-of-the-art reviews Vision Visual tasks
Online Access	Get full text
ISSN	2331-8422

Cover

Loading…

More Information
Summary:	The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training. However, effective strategies for vision downstream tasks with VLMs remain largely unclear and underexplored. In this paper, we propose an effective approach to leverage VLMs for better image cropping. First, we propose an efficient prompt retrieval mechanism for image cropping to automate the selection of in-context examples. Second, we introduce an iterative refinement strategy to iteratively enhance the predicted crops. The proposed framework, named Cropper, is applicable to a wide range of cropping tasks, including free-form cropping, subject-aware cropping, and aspect ratio-aware cropping. Extensive experiments and a user study demonstrate that Cropper significantly outperforms state-of-the-art methods across several benchmarks.
Bibliography:	content type line 50 SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1
ISSN:	2331-8422