Salient Object-Aware Background Generation using Text-Guided Diffusion Models

Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditi...

Full description

Saved in:
Bibliographic Details
Published inIEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops pp. 7489 - 7499
Main Authors Eshratifar, Amir Erfan, Soares, Joao V. B., Thadani, Kapil, Mishra, Shaunak, Kuznetsov, Mikhail, Ku, Yueh-Ning, De Juan, Paloma
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object's boundaries and thereby change the object's identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6× on average with no degradation in standard visual metrics across multiple datasets.
ISSN:2160-7516
DOI:10.1109/CVPRW63382.2024.00744