FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing (semantic image composition), we showcase the potential of utiliz...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
05.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We offer a novel approach to image composition, which integrates multiple
input images into a single, coherent image. Rather than concentrating on
specific use cases such as appearance editing (image harmonization) or semantic
editing (semantic image composition), we showcase the potential of utilizing
the powerful generative prior inherent in large-scale pre-trained diffusion
models to accomplish generic image composition applicable to both scenarios. We
observe that the pre-trained diffusion models automatically identify simple
copy-paste boundary areas as low-density regions during denoising. Building on
this insight, we propose to optimize the composed image towards high-density
regions guided by the diffusion prior. In addition, we introduce a novel
maskguided loss to further enable flexible semantic image composition.
Extensive experiments validate the superiority of our approach in achieving
generic zero-shot image composition. Additionally, our approach shows promising
potential in various tasks, such as object removal and multiconcept
customization. |
---|---|
DOI: | 10.48550/arxiv.2407.04947 |