There and Back Again: On the relation between noises, images, and their inversions in diffusion models

Denoising Diffusion Probabilistic Models (DDPMs) achieve state-of-the-art performance in synthesizing new images from random noise, but they lack meaningful latent space that encodes data into features. Recent DDPM-based editing techniques try to mitigate this issue by inverting images back to their...

Full description

Saved in:
Bibliographic Details
Main Authors Staniszewski, Łukasz, Kuciński, Łukasz, Deja, Kamil
Format Journal Article
LanguageEnglish
Published 30.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Denoising Diffusion Probabilistic Models (DDPMs) achieve state-of-the-art performance in synthesizing new images from random noise, but they lack meaningful latent space that encodes data into features. Recent DDPM-based editing techniques try to mitigate this issue by inverting images back to their approximated staring noise. In this work, we study the relation between the initial Gaussian noise, the samples generated from it, and their corresponding latent encodings obtained through the inversion procedure. First, we interpret their spatial distance relations to show the inaccuracy of the DDIM inversion technique by localizing latent representations manifold between the initial noise and generated samples. Then, we demonstrate the peculiar relation between initial Gaussian noise and its corresponding generations during diffusion training, showing that the high-level features of generated images stabilize rapidly, keeping the spatial distance relationship between noises and generations consistent throughout the training.
AbstractList Denoising Diffusion Probabilistic Models (DDPMs) achieve state-of-the-art performance in synthesizing new images from random noise, but they lack meaningful latent space that encodes data into features. Recent DDPM-based editing techniques try to mitigate this issue by inverting images back to their approximated staring noise. In this work, we study the relation between the initial Gaussian noise, the samples generated from it, and their corresponding latent encodings obtained through the inversion procedure. First, we interpret their spatial distance relations to show the inaccuracy of the DDIM inversion technique by localizing latent representations manifold between the initial noise and generated samples. Then, we demonstrate the peculiar relation between initial Gaussian noise and its corresponding generations during diffusion training, showing that the high-level features of generated images stabilize rapidly, keeping the spatial distance relationship between noises and generations consistent throughout the training.
Author Deja, Kamil
Staniszewski, Łukasz
Kuciński, Łukasz
Author_xml – sequence: 1
  givenname: Łukasz
  surname: Staniszewski
  fullname: Staniszewski, Łukasz
– sequence: 2
  givenname: Łukasz
  surname: Kuciński
  fullname: Kuciński, Łukasz
– sequence: 3
  givenname: Kamil
  surname: Deja
  fullname: Deja, Kamil
BackLink https://doi.org/10.48550/arXiv.2410.23530$$DView paper in arXiv
BookMark eNqFjrsOwjAMRTPAwOsDmPAHQCl9SIgNEIiNhb0K1GktWgclpcDfk1TsTNe-OrbOUPRYMwoxXYVBsk7TcCnNm9ogSlwRxWkcDoS6lGgQJOewk7c7bAtJvIEzQ1MiGKxkQ5rhis0LkYE1WbRzoFoWPv2dA8kAcYvGOta6EXJS6uk3qHWOlR2LvpKVxckvR2J2PFz2p0VnlD2Me2g-mTfLOrP4P_EF_HlFkA
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2410.23530
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2410_23530
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2410_235303
IEDL.DBID GOX
IngestDate Sat Nov 02 12:35:36 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2410_235303
OpenAccessLink https://arxiv.org/abs/2410.23530
ParticipantIDs arxiv_primary_2410_23530
PublicationCentury 2000
PublicationDate 2024-10-30
PublicationDateYYYYMMDD 2024-10-30
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-10-30
  day: 30
PublicationDecade 2020
PublicationYear 2024
Score 3.874237
SecondaryResourceType preprint
Snippet Denoising Diffusion Probabilistic Models (DDPMs) achieve state-of-the-art performance in synthesizing new images from random noise, but they lack meaningful...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
Title There and Back Again: On the relation between noises, images, and their inversions in diffusion models
URI https://arxiv.org/abs/2410.23530
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwED6VTiwIBKi8b2DEIjh22rAVRKkY6AJStiiPM4oQAcVtxc_nbAfB0imOfbFOfuS7sz-fAS7rimGz0kakpiahqCpEmRZKSG0MjaOax7VnWzwn81f1lOlsAPh7Fqbovpt1iA9c2muGF0dV1jE75VtSOsrW4yILm5M-FFcv_yfHNqbP-gcSs13Y6a07nIbu2IMBtftguCs6QnbZ8a6o3nH6xt74LS5aZOMLu56Nhj1jCtvPxpK9wuaDpzo_3Xd-OR-bdh1Wtywn0V1tsnJv6G-zsQdwMXt4uZ8Lr1n-FcJI5E7p3CsdH8KQnX0aAcYxJTd1wnaMVqow45S0YgHOjSQ39uQIRptqOd5cdALbksHY_3OjUxguuxWdMZguy3Pfoj8TWnmH
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=There+and+Back+Again%3A+On+the+relation+between+noises%2C+images%2C+and+their+inversions+in+diffusion+models&rft.au=Staniszewski%2C+%C5%81ukasz&rft.au=Kuci%C5%84ski%2C+%C5%81ukasz&rft.au=Deja%2C+Kamil&rft.date=2024-10-30&rft_id=info:doi/10.48550%2Farxiv.2410.23530&rft.externalDocID=2410_23530