Salient Object-Aware Background Generation using Text-Guided Diffusion Models

Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops pp. 7489 - 7499
Main Authors	Eshratifar, Amir Erfan, Soares, Joao V. B., Thadani, Kapil, Mishra, Shaunak, Kuznetsov, Mikhail, Ku, Yueh-Ning, De Juan, Paloma
Format	Conference Proceeding
Language	English
Published	IEEE 17.06.2024
Subjects	Adaptation models background-generation Conferences controlnet Degradation Diffusion models image-generation Measurement Pattern recognition stable-diffusion Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object's boundaries and thereby change the object's identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6× on average with no degradation in standard visual metrics across multiple datasets.
ISSN:	2160-7516
DOI:	10.1109/CVPRW63382.2024.00744