DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 22500 - 22510
Main Authors	Ruiz, Nataniel, Li, Yuanzhen, Jampani, Varun, Pritch, Yael, Rubinstein, Michael, Aberman, Kfir
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2023
Subjects	Computer vision Image and video synthesis and generation Lighting Pattern recognition Protocols Rendering (computer graphics) Semantics Task analysis
Online Access	Get full text

Cover

Loading…

Abstract	Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/
AbstractList	Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/
Author	Rubinstein, Michael Ruiz, Nataniel Jampani, Varun Pritch, Yael Li, Yuanzhen Aberman, Kfir
Author_xml	– sequence: 1 givenname: Nataniel surname: Ruiz fullname: Ruiz, Nataniel organization: Google Research – sequence: 2 givenname: Yuanzhen surname: Li fullname: Li, Yuanzhen organization: Google Research – sequence: 3 givenname: Varun surname: Jampani fullname: Jampani, Varun organization: Google Research – sequence: 4 givenname: Yael surname: Pritch fullname: Pritch, Yael organization: Google Research – sequence: 5 givenname: Michael surname: Rubinstein fullname: Rubinstein, Michael organization: Google Research – sequence: 6 givenname: Kfir surname: Aberman fullname: Aberman, Kfir organization: Google Research
BookMark	eNotjtFOwjAUQKvRRET-gIf-wPDe23VrfVMQJMFodPpKuu0OS6A12zD695Lo03k5OTmX4izEwEKMESaIYK-n788vmnKyEwJSEyDU-kSMbG6N0qAAyZpTMUDIVJJZtBdi1HVbAFCEmFkzEMWsZbe_i7H_uJFzH1gWh-DDRhb83Sd9TJZ7t2E5801z6HwM8jHWvOtkE1v5eii3XPXJrPVfHOSCA7euP0pX4rxxu45H_xyKt_l9MX1IVk-L5fR2lXjKoT8elWjZKAPO1hWpLM1q0FipmkHlaYmIKtfsKCXrakeQGc0IVQOAlS5LNRTjv65n5vVn6_eu_VkjEKTGgPoF95FSoQ
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR52729.2023.02155
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9798350301298
EISSN	1063-6919
EndPage	22510
ExternalDocumentID	10204880
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i270t-69b19e8380a9dc23646d051c3de0374b111375ea2429ada20685e10cf001c5bb3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:56:30 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i270t-69b19e8380a9dc23646d051c3de0374b111375ea2429ada20685e10cf001c5bb3
PageCount	11
ParticipantIDs	ieee_primary_10204880
PublicationCentury	2000
PublicationDate	2023-01-01
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– month: 01 year: 2023 text: 2023-01-01 day: 01
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.697097
Snippet	Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt....
SourceID	ieee
SourceType	Publisher
StartPage	22500
SubjectTerms	Computer vision Image and video synthesis and generation Lighting Pattern recognition Protocols Rendering (computer graphics) Semantics Task analysis
Title	DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
URI	https://ieeexplore.ieee.org/document/10204880
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5sT57qo-KbPXjdmNcmWY82lipYiqTSW8m-oGgbaZOLv96ZJK0oCN5CLgk7mfm-_bLfDCE3Mfc04IxgRmvLwtxKlutYMAvwFBgTSb92vT-Po9E0fJrxWWtWr70wxpj68Jlx8LL-l68LVaFUBhnu1x9ch3Rg59aYtXaCSgBbmUgkrT3Oc8Xt4HXywn1gjw7OCHcQ3fiPISo1hgx7ZLx9enN05M2pSumoz1-NGf_9egek_23Xo5MdEB2SPbM6Ir2WX9I2ezfHJEuBIS7vCwjOHR0CvaRZhbIIzXD7WxbscQnVhaYLaysU0SgOSnvfUOC1FAoMKjYsXWN5pE23agxqn0yHD9lgxNqpCmzhx27JIiE9YZIgcXOhFfaPjzRkpgq0wV40EmfPx9zkgN0i17nvRgk3nqssAJriUgYnpLsqVuYU_d4KCFKog9izodKJFEBwPBtYYImJm9gz0sdVmn80jTPm2wU6_-P-BdnHSDUKxyXpluvKXAHml_K6jvUX9D6qdA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELWgDDCVjyK-8cDqkC8nMSMtVQttVaEUsVVxbEsVtEFtsvDruUvSIpCQ2KIsiXy5e88vfneE3ITcUYAzgmmlDPMTI1miQsEMwJOndSDd0vU-HAW9if_4yl9rs3rphdFal4fPtIWX5b98laUFSmWQ4W75wW2THQB-7lR2rY2k4sFmJhBRbZBzbHHbfhk_cxf4o4VTwi3EN_5jjEqJIt0mGa2fXx0eebOKXFrp56_WjP9-wX3S-jbs0fEGig7Ill4ckmbNMGmdv6sjEneAI87vMwjPHe0CwaRxgcIIjXEDnGesP4f6QjszYwqU0SiOSntfUWC2FEoMajass8QCSat-1RjWFpl0H-J2j9VzFdjMDe2cBUI6QkdeZCdCpdhBPlCQm6mnNHajkTh9PuQ6AfQWiUpcO4i4duzUAKSlXErvmDQW2UKfoOM7BYrkKy90jJ-qSAqgOI7xDPDEyI7MKWnhKk0_qtYZ0_UCnf1x_5rs9uLhYDroj57OyR5GrdI7LkgjXxb6EhhALq_KuH8BC5itvQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=DreamBooth%3A+Fine+Tuning+Text-to-Image+Diffusion+Models+for+Subject-Driven+Generation&rft.au=Ruiz%2C+Nataniel&rft.au=Li%2C+Yuanzhen&rft.au=Jampani%2C+Varun&rft.au=Pritch%2C+Yael&rft.date=2023-01-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=22500&rft.epage=22510&rft_id=info:doi/10.1109%2FCVPR52729.2023.02155&rft.externalDocID=10204880