Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models
Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another cruci...
Saved in:
Published in | Applied sciences Vol. 14; no. 8; p. 3338 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect of images, has been limited. This study focuses on enabling models to capture and generate high-level semantic images with specific relation concepts, which is a challenging task. To this end, we introduce the Inv-ReVersion framework, which uses inverse relations text expansion to separate the feature fusion of multiple entities in images. Additionally, we employ a weighted contrastive loss to emphasize part of speech, helping the model learn more abstract relation concepts. We also propose a high-frequency suppressor to reduce the time spent on learning low-frequency details, enhancing the model’s ability to generate image relations. Compared to existing baselines, our approach can more accurately generate relation concepts between entities without additional computational costs, especially in capturing abstract relation concepts. |
---|---|
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app14083338 |