Feedback Efficient Online Fine-Tuning of Diffusion Models
Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, o...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
18.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, or molecules with high bioactivity. It is natural to frame this as a reinforcement learning (RL) problem, in which the objective is to fine-tune a diffusion model to maximize a reward function that corresponds to some property. Even with access to online queries of the ground-truth reward function, efficiently discovering high-reward samples can be challenging: they might have a low probability in the initial distribution, and there might be many infeasible samples that do not even have a well-defined reward (e.g., unnatural images or physically impossible molecules). In this work, we propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples. We present a theoretical analysis providing a regret guarantee, as well as empirical validation across three domains: images, biological sequences, and molecules. |
---|---|
AbstractList | Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, or molecules with high bioactivity. It is natural to frame this as a reinforcement learning (RL) problem, in which the objective is to fine-tune a diffusion model to maximize a reward function that corresponds to some property. Even with access to online queries of the ground-truth reward function, efficiently discovering high-reward samples can be challenging: they might have a low probability in the initial distribution, and there might be many infeasible samples that do not even have a well-defined reward (e.g., unnatural images or physically impossible molecules). In this work, we propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples. We present a theoretical analysis providing a regret guarantee, as well as empirical validation across three domains: images, biological sequences, and molecules. |
Author | Black, Kevin Biancalani, Tommaso Nathaniel Lee Diamant Tseng, Alex M Levine, Sergey Uehara, Masatoshi Hajiramezanali, Ehsan Zhao, Yulai Scalia, Gabriele |
Author_xml | – sequence: 1 givenname: Masatoshi surname: Uehara fullname: Uehara, Masatoshi – sequence: 2 givenname: Yulai surname: Zhao fullname: Zhao, Yulai – sequence: 3 givenname: Kevin surname: Black fullname: Black, Kevin – sequence: 4 givenname: Ehsan surname: Hajiramezanali fullname: Hajiramezanali, Ehsan – sequence: 5 givenname: Gabriele surname: Scalia fullname: Scalia, Gabriele – sequence: 6 fullname: Nathaniel Lee Diamant – sequence: 7 givenname: Alex surname: Tseng middlename: M fullname: Tseng, Alex M – sequence: 8 givenname: Sergey surname: Levine fullname: Levine, Sergey – sequence: 9 givenname: Tommaso surname: Biancalani fullname: Biancalani, Tommaso |
BookMark | eNqNirEOgjAUABujiaj8QxPnJvVVkM5K42Jc2AnCqymSV6X0_2XwA1zuhrsNW5InXLAElDqI4giwZmkIvZQS8hNkmUqYNojdo2lfvLTWtQ5p4ncaHCE3M0QVydGTe8svztoYnCd-8x0OYcdWthkCpj9v2d6U1fkq3qP_RAxT3fs40pxq0ApyWeQa1H_XF26xN5U |
ContentType | Paper |
Copyright | 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_29326086923 |
IEDL.DBID | BENPR |
IngestDate | Thu Oct 10 22:45:09 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_29326086923 |
OpenAccessLink | https://www.proquest.com/docview/2932608692?pq-origsite=%requestingapplication% |
PQID | 2932608692 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2932608692 |
PublicationCentury | 2000 |
PublicationDate | 20240718 |
PublicationDateYYYYMMDD | 2024-07-18 |
PublicationDate_xml | – month: 07 year: 2024 text: 20240718 day: 18 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2024 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.548039 |
SecondaryResourceType | preprint |
Snippet | Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Empirical analysis Image quality |
Title | Feedback Efficient Online Fine-Tuning of Diffusion Models |
URI | https://www.proquest.com/docview/2932608692 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NSwMxEB3sLoI3P_GjloBeg9tsmmRPgrqxCC1FKvRWsrsJiGJrt3v1tztZUz0IvQyEQIaEMG_eS5gBuBZWJZUQFUWsSyjP1IAq58vjpYVJlZGI4Z4ojsZi-MKfZoNZENzq8K1yExPbQF0tSq-R3zCfaGD-nbHb5Sf1XaP862poodGBmCFTSCKI7_Lx5PlXZWFCYs6c_gu0LXrofYgnZmlXB7BjPw5ht_10WdZHkGmEjsKUbyRvCzlg_Cc_pT-JRkOnjRctyMKRh1fnGq9rEd-77L0-hiudT--HdONvHu5EPf_bQXoCEZJ7ewrEmcIi5kqGllclOnVcMGe4NMhcZP8MuttWOt8-fQF7DEHYa5F91YVovWrsJYLouuhBR-nHXjgvHI2-8m9Z5Xsl |
link.rule.ids | 783,787,12777,21400,33385,33756,43612,43817 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED50RdybP_HH1IC-Bre0S9snQW2pupUhFfZW0jYBUdxc1_9_dzXTB2Ev9xLIkRDuu_tyfAdwI3XQr6SsOGJdn3thMOSBIXk8t1BuoHzEcCoUx6lM3rzn6XBqCbfatlWuY2IbqKtZSRz5raBEA_PvUNzNvzlNjaLfVTtCYxsckqrC4su5j9LJ6y_LIqSPObP7L9C26BHvgTNRc73Yhy39dQA7bdNlWR9CGCN0FKr8YFEr5IDxn_1If7IYDc8aIi3YzLDHd2Ma4rUYzS77rI_gOo6yh4Sv_eX2TdT53wncY-hgca9PgBlVaMRcX6D1qhKdGk8KozxfYeXiD06ht2mns83LV7CbZONRPnpKX86hKxCQiZccBD3oLBeNvkBAXRaX9tZWhLt8CA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Feedback+Efficient+Online+Fine-Tuning+of+Diffusion+Models&rft.jtitle=arXiv.org&rft.au=Uehara%2C+Masatoshi&rft.au=Zhao%2C+Yulai&rft.au=Black%2C+Kevin&rft.au=Hajiramezanali%2C+Ehsan&rft.date=2024-07-18&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |