Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models

Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another cruci...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 14; no. 8; p. 3338
Main Authors Zhang, Guangzi, Qian, Yulin, Deng, Juntao, Cai, Xingquan
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect of images, has been limited. This study focuses on enabling models to capture and generate high-level semantic images with specific relation concepts, which is a challenging task. To this end, we introduce the Inv-ReVersion framework, which uses inverse relations text expansion to separate the feature fusion of multiple entities in images. Additionally, we employ a weighted contrastive loss to emphasize part of speech, helping the model learn more abstract relation concepts. We also propose a high-frequency suppressor to reduce the time spent on learning low-frequency details, enhancing the model’s ability to generate image relations. Compared to existing baselines, our approach can more accurately generate relation concepts between entities without additional computational costs, especially in capturing abstract relation concepts.
ISSN:2076-3417
2076-3417
DOI:10.3390/app14083338