BoKDiff: best-of-K diffusion alignment for target-specific 3D molecule generation
Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their e...
Saved in:
Published in | Bioinformatics advances Vol. 5; no. 1; p. vbaf137 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
01.01.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2635-0041 2635-0041 |
DOI | 10.1093/bioadv/vbaf137 |
Cover
Abstract | Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.
Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.
Code is available at https://github.com/khodabandeh-ali/BoKDiff.git. |
---|---|
AbstractList | Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.
Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.
Code is available at https://github.com/khodabandeh-ali/BoKDiff.git. Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.MotivationStructure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.ResultsBuilt on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.Code is available at https://github.com/khodabandeh-ali/BoKDiff.git.Availability and implementationCode is available at https://github.com/khodabandeh-ali/BoKDiff.git. |
Author | Khodabandeh Yalabadi, Ali Yazdani-Jahromi, Mehdi Garibay, Ozlem Ozmen |
Author_xml | – sequence: 1 givenname: Ali orcidid: 0009-0009-4310-0259 surname: Khodabandeh Yalabadi fullname: Khodabandeh Yalabadi, Ali – sequence: 2 givenname: Mehdi surname: Yazdani-Jahromi fullname: Yazdani-Jahromi, Mehdi – sequence: 3 givenname: Ozlem Ozmen orcidid: 0000-0001-9215-694X surname: Garibay fullname: Garibay, Ozlem Ozmen |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40621602$$D View this record in MEDLINE/PubMed |
BookMark | eNpVUU1PwzAMjRCI7ytH1COXbvlq0nJBMD7FJIQE5yhtnBLUNiNpJ_Hv6bQxwcm2_Pye7XeEdjvfAUJnBE8ILti0dF6b5XRZakuY3EGHVLAsxZiT3T_5ATqN8RNjTKUUhLN9dMCxoERgeoheb_zzrbP2Mikh9qm36XNixnqIzneJblzdtdD1ifUh6XWooU_jAipnXZWw26T1DVRDA0kNHQTdj0MnaM_qJsLpJh6j9_u7t9ljOn95eJpdz9OK8bxPDTVWGpZVXJJCAM9kRrnMDAOaZ4YLILoghBSWYc60LhjNcxAkszk3mgjOjtHVmncxlC2Yatwy6EYtgmt1-FZeO_W_07kPVfulIpTSvBByZLjYMAT_NYznq9bFCppGd-CHqBilUrCcsxX0_K_YVuX3kSNgsgZUwccYwG4hBKuVWWptltqYxX4Avt-IqA |
Cites_doi | 10.1186/1758-2946-1-8 10.1186/s13321-025-00946-0 10.1038/nchem.1243 10.1038/s41573-019-0024-5 10.1016/j.chembiol.2003.09.002 10.1021/acs.jcim.1c00203 10.1021/acscentsci.3c00572 10.1186/s13321-020-00429-4 10.3390/molecules28010175 10.1021/acs.jcim.0c00411 10.1039/D1SC05976A 10.1007/s40747-024-01369-4 |
ContentType | Journal Article |
Copyright | The Author(s) 2025. Published by Oxford University Press. The Author(s) 2025. Published by Oxford University Press. 2025 |
Copyright_xml | – notice: The Author(s) 2025. Published by Oxford University Press. – notice: The Author(s) 2025. Published by Oxford University Press. 2025 |
DBID | AAYXX CITATION NPM 7X8 5PM |
DOI | 10.1093/bioadv/vbaf137 |
DatabaseName | CrossRef PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | PubMed MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 2635-0041 |
ExternalDocumentID | PMC12228967 40621602 10_1093_bioadv_vbaf137 |
Genre | Journal Article |
GroupedDBID | 0R~ AAYXX ABEJV ABGNP ABXVV AFKRA ALMA_UNASSIGNED_HOLDINGS AMNDL BBNVY BENPR BHPHI CCPQU CITATION GROUPED_DOAJ HCIFZ M7P M~E OK1 PHGZM PHGZT PIMPY PQGLB RPM TOX ZCN ABDBF NPM 7X8 5PM |
ID | FETCH-LOGICAL-c348t-d2df7d35c47196e45752475d3e285d46e1a91119f3043aa93288e615f84da1643 |
ISSN | 2635-0041 |
IngestDate | Thu Aug 21 18:22:22 EDT 2025 Fri Sep 05 15:43:19 EDT 2025 Mon Jul 21 06:02:37 EDT 2025 Thu Jul 10 08:40:31 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | https://creativecommons.org/licenses/by/4.0 The Author(s) 2025. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c348t-d2df7d35c47196e45752475d3e285d46e1a91119f3043aa93288e615f84da1643 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0001-9215-694X 0009-0009-4310-0259 |
OpenAccessLink | http://dx.doi.org/10.1093/bioadv/vbaf137 |
PMID | 40621602 |
PQID | 3227638437 |
PQPubID | 23479 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_12228967 proquest_miscellaneous_3227638437 pubmed_primary_40621602 crossref_primary_10_1093_bioadv_vbaf137 |
PublicationCentury | 2000 |
PublicationDate | 2025-01-01 |
PublicationDateYYYYMMDD | 2025-01-01 |
PublicationDate_xml | – month: 01 year: 2025 text: 2025-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Bioinformatics advances |
PublicationTitleAlternate | Bioinform Adv |
PublicationYear | 2025 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Zhang (2025070603362509700_vbaf137-B33) 2023 Ferla (2025070603362509700_vbaf137-B10) 2025; 17 Guan (2025070603362509700_vbaf137-B14) 2024 Ziegler (2025070603362509700_vbaf137-B37) 2019 Black (2025070603362509700_vbaf137-B4) 2023 Schulman (2025070603362509700_vbaf137-B27) 2017 Schneuing (2025070603362509700_vbaf137-B26) Peng (2025070603362509700_vbaf137-B22) 2022 Powers (2025070603362509700_vbaf137-B23) 2023; 9 Lin (2025070603362509700_vbaf137-B18) 2022 Ouyang (2025070603362509700_vbaf137-B21) 2022; 35 Eberhardt (2025070603362509700_vbaf137-B8) 2021; 61 Liu (2025070603362509700_vbaf137-B19) 2022 Wallace (2025070603362509700_vbaf137-B30) 2024 Rafailov (2025070603362509700_vbaf137-B24) 2024; 36 Luo (2025070603362509700_vbaf137-B20) 2021; 34 Zhang (2025070603362509700_vbaf137-B32) 2024. Zhou (2025070603362509700_vbaf137-B35) 2024 Guan (2025070603362509700_vbaf137-B13) 2023 Dong (2025070603362509700_vbaf137-B7) 2023 Li (2025070603362509700_vbaf137-B17) 2024; 10 Blanes-Mira (2025070603362509700_vbaf137-B5) 2022; 28 Beirami (2025070603362509700_vbaf137-B2) 2024 Gui (2025070603362509700_vbaf137-B15) 2024 Christiano (2025070603362509700_vbaf137-B6) 2017; 30 Bickerton (2025070603362509700_vbaf137-B3) 2012; 4 Zhang (2025070603362509700_vbaf137-B34) 2023 Zhou (2025070603362509700_vbaf137-B36) 2024 Yazdani-Jahromi (2025070603362509700_vbaf137-B31) 2024; 37 Spiegel (2025070603362509700_vbaf137-B28) 2020; 12 Ertl (2025070603362509700_vbaf137-B9) 2009; 1 Fu (2025070603362509700_vbaf137-B12) 2022; 35 Jin (2025070603362509700_vbaf137-B16) 2020 Vamathevan (2025070603362509700_vbaf137-B29) 2019; 18 Anderson (2025070603362509700_vbaf137-B1) 2003; 10 Francoeur (2025070603362509700_vbaf137-B11) 2020; 60 Ragoza (2025070603362509700_vbaf137-B25) 2022; 13 |
References_xml | – volume: 34 start-page: 6229 year: 2021 ident: 2025070603362509700_vbaf137-B20 article-title: A 3D generative model for structure-based drug design publication-title: Adv Neural Inf Process Syst – volume: 37 start-page: 105780 year: 2024 ident: 2025070603362509700_vbaf137-B31 article-title: Fair bilevel neural network (FairBiNN): on balancing fairness and accuracy via Stackelberg equilibrium publication-title: Adv Neural Inf Process Syst – volume: 1 start-page: 8 year: 2009 ident: 2025070603362509700_vbaf137-B9 article-title: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions publication-title: J Cheminform doi: 10.1186/1758-2946-1-8 – volume: 17 start-page: 4 year: 2025 ident: 2025070603362509700_vbaf137-B10 article-title: Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology publication-title: J Cheminform doi: 10.1186/s13321-025-00946-0 – volume: 35 start-page: 12325 year: 2022 ident: 2025070603362509700_vbaf137-B12 article-title: Reinforced genetic algorithm for structure-based drug design publication-title: Adv Neural Inf Process Syst – volume: 35 start-page: 27730 year: 2022 ident: 2025070603362509700_vbaf137-B21 article-title: Training language models to follow instructions with human feedback publication-title: Adv Neural Inf Process Syst – year: 2023 ident: 2025070603362509700_vbaf137-B34 – year: 2023 ident: 2025070603362509700_vbaf137-B4 – year: 2022 ident: 2025070603362509700_vbaf137-B19 – year: 2017 ident: 2025070603362509700_vbaf137-B27 – year: 2024. ident: 2025070603362509700_vbaf137-B32 – year: 2023 ident: 2025070603362509700_vbaf137-B13 – year: 2023 ident: 2025070603362509700_vbaf137-B7 – volume: 4 start-page: 90 year: 2012 ident: 2025070603362509700_vbaf137-B3 article-title: Quantifying the chemical beauty of drugs publication-title: Nat Chem doi: 10.1038/nchem.1243 – start-page: 17644 year: 2022 ident: 2025070603362509700_vbaf137-B22 – volume: 18 start-page: 463 year: 2019 ident: 2025070603362509700_vbaf137-B29 article-title: Applications of machine learning in drug discovery and development publication-title: Nat Rev Drug Discov doi: 10.1038/s41573-019-0024-5 – volume: 10 start-page: 787 year: 2003 ident: 2025070603362509700_vbaf137-B1 article-title: The process of structure-based drug design publication-title: Chem Biol doi: 10.1016/j.chembiol.2003.09.002 – year: 2024 ident: 2025070603362509700_vbaf137-B36 – volume: 61 start-page: 3891 year: 2021 ident: 2025070603362509700_vbaf137-B8 article-title: Autodock vina 1.2. 0: new docking methods, expanded force field, and python bindings publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.1c00203 – year: 2024 ident: 2025070603362509700_vbaf137-B2 – year: 2024 ident: 2025070603362509700_vbaf137-B35 – volume: 9 start-page: 2257 year: 2023 ident: 2025070603362509700_vbaf137-B23 article-title: Geometric deep learning for structure-based ligand design publication-title: ACS Cent Sci doi: 10.1021/acscentsci.3c00572 – volume: 12 start-page: 25 year: 2020 ident: 2025070603362509700_vbaf137-B28 article-title: Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization publication-title: J Cheminform doi: 10.1186/s13321-020-00429-4 – volume: 30 year: 2017 ident: 2025070603362509700_vbaf137-B6 article-title: Deep reinforcement learning from human preferences publication-title: Adv Neural Inf process Syst – volume: 28 start-page: 175 year: 2022 ident: 2025070603362509700_vbaf137-B5 article-title: Comprehensive survey of consensus docking for high-throughput virtual screening publication-title: Molecules doi: 10.3390/molecules28010175 – volume: 60 start-page: 4200 year: 2020 ident: 2025070603362509700_vbaf137-B11 article-title: Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.0c00411 – year: 2024 ident: 2025070603362509700_vbaf137-B15 – start-page: 899 ident: 2025070603362509700_vbaf137-B26 – year: 2024 ident: 2025070603362509700_vbaf137-B14 – volume: 13 start-page: 2701 year: 2022 ident: 2025070603362509700_vbaf137-B25 article-title: Generating 3D molecules conditional on receptor binding sites with deep generative models publication-title: Chem Sci doi: 10.1039/D1SC05976A – start-page: 8228 year: 2024 ident: 2025070603362509700_vbaf137-B30 – start-page: 41382 year: 2023 ident: 2025070603362509700_vbaf137-B33 – year: 2022 ident: 2025070603362509700_vbaf137-B18 – volume: 36 start-page: 53728 year: 2024 ident: 2025070603362509700_vbaf137-B24 article-title: Direct preference optimization: your language model is secretly a reward model publication-title: Adv Neural Inf Process Syst – volume: 10 start-page: 4421 year: 2024 ident: 2025070603362509700_vbaf137-B17 article-title: A prediction method for dynamic multiobjective optimization based on joint subspace and correlation alignment publication-title: Complex Intell Syst doi: 10.1007/s40747-024-01369-4 – start-page: 4849 year: 2020 ident: 2025070603362509700_vbaf137-B16 – year: 2019 ident: 2025070603362509700_vbaf137-B37 |
SSID | ssj0002776143 |
Score | 2.2781584 |
Snippet | Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion... |
SourceID | pubmedcentral proquest pubmed crossref |
SourceType | Open Access Repository Aggregation Database Index Database |
StartPage | vbaf137 |
SubjectTerms | Original |
Title | BoKDiff: best-of-K diffusion alignment for target-specific 3D molecule generation |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40621602 https://www.proquest.com/docview/3227638437 https://pubmed.ncbi.nlm.nih.gov/PMC12228967 |
Volume | 5 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZgE2gvE3c6oAoSEg-VWWLnyhujg2nTxkWbNJ4iO7ZJ0JpMXYtEfz3HduK065CAFyuKLUc55-j4XD8j9EoRnyuw2jDNggKHMhCYURJhoaIoIQURmdTNyccn8cFZeHgenfelvKa7ZMbfFIsb-0r-h6vwDviqu2T_gbNuU3gBz8BfGIHDMP4Vj_eao7GGVQSvnoN2x43CR-bKk_mVqTK-qL7bZL-pJTQ131i3VuryoBEdjyb2blyp71GW055HXZK3alpcVYvlbMsF-uxP2QjGdRC6HH1jIE1MVLZnpnK6hC0Eqyt8yMppMzGzx7IUbv4jeOqcGT5_WlzICYyTtjetjUSQaCkSIY3G0sA2WCN4LavXaE2KrKr8yZkKLODLmhq3EFe8auDP4OGGpUDZy4lhK1gkJIh90h9orsywm7qNNkmSmCx-F8z5YXKuCRgn1EF50l37xd32e1vobrfDqtWy5opcr6hdMlFO76Ht1rfw3llBuY9uyfoBumNvG_31EH1pxeWt54TFc8LiOWHxgOPeNWHx6NjrhMXrheUROvuwf_r-ALc3auCChukMCyJUImhUgEmSxTIEW52ESSSoJGkkwlgGTB9-maJ-SBkD2z5NJdi8Kg0FA8eaPkYbdVPLp8gTKiuARhl4aDLkKk1VrNHiZMr8wpcZH6DXHcXySwucktuCB5pbMuctmQfoZUfQHHSbTlixWjbzqxwOGzj-0lCveWIJ7PbqODNA6Qrp3QKNm746U1elwU8PdNQzi5OdP276DG31Av4cbcymc_kCjM8ZH6LNvf2Tz1-HJngzNAL1G_BGi-U |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BoKDiff%3A+best-of-K+diffusion+alignment+for+target-specific+3D+molecule+generation&rft.jtitle=Bioinformatics+advances&rft.au=Khodabandeh+Yalabadi%2C+Ali&rft.au=Yazdani-Jahromi%2C+Mehdi&rft.au=Garibay%2C+Ozlem+Ozmen&rft.date=2025-01-01&rft.eissn=2635-0041&rft.volume=5&rft.issue=1&rft.spage=vbaf137&rft_id=info:doi/10.1093%2Fbioadv%2Fvbaf137&rft_id=info%3Apmid%2F40621602&rft.externalDocID=40621602 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2635-0041&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2635-0041&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2635-0041&client=summon |