DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 22500 - 22510
Main Authors Ruiz, Nataniel, Li, Yuanzhen, Jampani, Varun, Pritch, Yael, Rubinstein, Michael, Aberman, Kfir
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/
AbstractList Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/
Author Rubinstein, Michael
Ruiz, Nataniel
Jampani, Varun
Pritch, Yael
Li, Yuanzhen
Aberman, Kfir
Author_xml – sequence: 1
  givenname: Nataniel
  surname: Ruiz
  fullname: Ruiz, Nataniel
  organization: Google Research
– sequence: 2
  givenname: Yuanzhen
  surname: Li
  fullname: Li, Yuanzhen
  organization: Google Research
– sequence: 3
  givenname: Varun
  surname: Jampani
  fullname: Jampani, Varun
  organization: Google Research
– sequence: 4
  givenname: Yael
  surname: Pritch
  fullname: Pritch, Yael
  organization: Google Research
– sequence: 5
  givenname: Michael
  surname: Rubinstein
  fullname: Rubinstein, Michael
  organization: Google Research
– sequence: 6
  givenname: Kfir
  surname: Aberman
  fullname: Aberman, Kfir
  organization: Google Research
BookMark eNotjtFOwjAUQKvRRET-gIf-wPDe23VrfVMQJMFodPpKuu0OS6A12zD695Lo03k5OTmX4izEwEKMESaIYK-n788vmnKyEwJSEyDU-kSMbG6N0qAAyZpTMUDIVJJZtBdi1HVbAFCEmFkzEMWsZbe_i7H_uJFzH1gWh-DDRhb83Sd9TJZ7t2E5801z6HwM8jHWvOtkE1v5eii3XPXJrPVfHOSCA7euP0pX4rxxu45H_xyKt_l9MX1IVk-L5fR2lXjKoT8elWjZKAPO1hWpLM1q0FipmkHlaYmIKtfsKCXrakeQGc0IVQOAlS5LNRTjv65n5vVn6_eu_VkjEKTGgPoF95FSoQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52729.2023.02155
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350301298
EISSN 1063-6919
EndPage 22510
ExternalDocumentID 10204880
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i270t-69b19e8380a9dc23646d051c3de0374b111375ea2429ada20685e10cf001c5bb3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:56:30 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i270t-69b19e8380a9dc23646d051c3de0374b111375ea2429ada20685e10cf001c5bb3
PageCount 11
ParticipantIDs ieee_primary_10204880
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.697097
Snippet Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt....
SourceID ieee
SourceType Publisher
StartPage 22500
SubjectTerms Computer vision
Image and video synthesis and generation
Lighting
Pattern recognition
Protocols
Rendering (computer graphics)
Semantics
Task analysis
Title DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
URI https://ieeexplore.ieee.org/document/10204880
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5sT57qo-KbPXjdmNcmWY82lipYiqTSW8m-oGgbaZOLv96ZJK0oCN5CLgk7mfm-_bLfDCE3Mfc04IxgRmvLwtxKlutYMAvwFBgTSb92vT-Po9E0fJrxWWtWr70wxpj68Jlx8LL-l68LVaFUBhnu1x9ch3Rg59aYtXaCSgBbmUgkrT3Oc8Xt4HXywn1gjw7OCHcQ3fiPISo1hgx7ZLx9enN05M2pSumoz1-NGf_9egek_23Xo5MdEB2SPbM6Ir2WX9I2ezfHJEuBIS7vCwjOHR0CvaRZhbIIzXD7WxbscQnVhaYLaysU0SgOSnvfUOC1FAoMKjYsXWN5pE23agxqn0yHD9lgxNqpCmzhx27JIiE9YZIgcXOhFfaPjzRkpgq0wV40EmfPx9zkgN0i17nvRgk3nqssAJriUgYnpLsqVuYU_d4KCFKog9izodKJFEBwPBtYYImJm9gz0sdVmn80jTPm2wU6_-P-BdnHSDUKxyXpluvKXAHml_K6jvUX9D6qdA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELWgDDCVjyK-8cDqkC8nMSMtVQttVaEUsVVxbEsVtEFtsvDruUvSIpCQ2KIsiXy5e88vfneE3ITcUYAzgmmlDPMTI1miQsEMwJOndSDd0vU-HAW9if_4yl9rs3rphdFal4fPtIWX5b98laUFSmWQ4W75wW2THQB-7lR2rY2k4sFmJhBRbZBzbHHbfhk_cxf4o4VTwi3EN_5jjEqJIt0mGa2fXx0eebOKXFrp56_WjP9-wX3S-jbs0fEGig7Ill4ckmbNMGmdv6sjEneAI87vMwjPHe0CwaRxgcIIjXEDnGesP4f6QjszYwqU0SiOSntfUWC2FEoMajass8QCSat-1RjWFpl0H-J2j9VzFdjMDe2cBUI6QkdeZCdCpdhBPlCQm6mnNHajkTh9PuQ6AfQWiUpcO4i4duzUAKSlXErvmDQW2UKfoOM7BYrkKy90jJ-qSAqgOI7xDPDEyI7MKWnhKk0_qtYZ0_UCnf1x_5rs9uLhYDroj57OyR5GrdI7LkgjXxb6EhhALq_KuH8BC5itvQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=DreamBooth%3A+Fine+Tuning+Text-to-Image+Diffusion+Models+for+Subject-Driven+Generation&rft.au=Ruiz%2C+Nataniel&rft.au=Li%2C+Yuanzhen&rft.au=Jampani%2C+Varun&rft.au=Pritch%2C+Yael&rft.date=2023-01-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=22500&rft.epage=22510&rft_id=info:doi/10.1109%2FCVPR52729.2023.02155&rft.externalDocID=10204880