Training-free subject-enhanced attention guidance for compositional text-to-image generation

•Propose a zero-shot diffusion-based framework for subject-driven generation task.•Introduce a training-free subject-enhanced attention guidance.•Propose a novel evaluation metric GroundingScore for comprehensive assessment. [Display omitted] Existing subject-driven text-to-image generation models s...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 170; p. 112111
Main Authors	Liu, Shengyuan, Wang, Bo, Ma, Ye, Yang, Te, Chen, Quan, Dong, Di
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.02.2026
Subjects	Compositional generation Diffusion model Subject-driven generation Compositional generation Subject-driven generation Diffusion model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Propose a zero-shot diffusion-based framework for subject-driven generation task.•Introduce a training-free subject-enhanced attention guidance.•Propose a novel evaluation metric GroundingScore for comprehensive assessment. [Display omitted] Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel GroundingScore metric to thoroughly assess subject alignment. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method.
ISSN:	0031-3203
DOI:	10.1016/j.patcog.2025.112111