Interpretable Diffusion via Information Decomposition
Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
11.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Denoising diffusion models enable conditional generation and density modeling
of complex relationships like images and text. However, the nature of the
learned relationships is opaque making it difficult to understand precisely
what relationships between words and parts of an image are captured, or to
predict the effect of an intervention. We illuminate the fine-grained
relationships learned by diffusion models by noticing a precise relationship
between diffusion and information decomposition. Exact expressions for mutual
information and conditional mutual information can be written in terms of the
denoising model. Furthermore, pointwise estimates can be easily estimated as
well, allowing us to ask questions about the relationships between specific
images and captions. Decomposing information even further to understand which
variables in a high-dimensional space carry information is a long-standing
problem. For diffusion models, we show that a natural non-negative
decomposition of mutual information emerges, allowing us to quantify
informative relationships between words and pixels in an image. We exploit
these new relations to measure the compositional understanding of diffusion
models, to do unsupervised localization of objects in images, and to measure
effects when selectively editing images through prompt interventions. |
---|---|
DOI: | 10.48550/arxiv.2310.07972 |