BoKDiff: best-of-K diffusion alignment for target-specific 3D molecule generation

Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their e...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics advances Vol. 5; no. 1; p. vbaf137
Main Authors Khodabandeh Yalabadi, Ali, Yazdani-Jahromi, Mehdi, Garibay, Ozlem Ozmen
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.01.2025
Subjects
Online AccessGet full text
ISSN2635-0041
2635-0041
DOI10.1093/bioadv/vbaf137

Cover

Abstract Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation. Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design. Code is available at https://github.com/khodabandeh-ali/BoKDiff.git.
AbstractList Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation. Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design. Code is available at https://github.com/khodabandeh-ali/BoKDiff.git.
Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.MotivationStructure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.ResultsBuilt on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.Code is available at https://github.com/khodabandeh-ali/BoKDiff.git.Availability and implementationCode is available at https://github.com/khodabandeh-ali/BoKDiff.git.
Author Khodabandeh Yalabadi, Ali
Yazdani-Jahromi, Mehdi
Garibay, Ozlem Ozmen
Author_xml – sequence: 1
  givenname: Ali
  orcidid: 0009-0009-4310-0259
  surname: Khodabandeh Yalabadi
  fullname: Khodabandeh Yalabadi, Ali
– sequence: 2
  givenname: Mehdi
  surname: Yazdani-Jahromi
  fullname: Yazdani-Jahromi, Mehdi
– sequence: 3
  givenname: Ozlem Ozmen
  orcidid: 0000-0001-9215-694X
  surname: Garibay
  fullname: Garibay, Ozlem Ozmen
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40621602$$D View this record in MEDLINE/PubMed
BookMark eNpVUU1PwzAMjRCI7ytH1COXbvlq0nJBMD7FJIQE5yhtnBLUNiNpJ_Hv6bQxwcm2_Pye7XeEdjvfAUJnBE8ILti0dF6b5XRZakuY3EGHVLAsxZiT3T_5ATqN8RNjTKUUhLN9dMCxoERgeoheb_zzrbP2Mikh9qm36XNixnqIzneJblzdtdD1ifUh6XWooU_jAipnXZWw26T1DVRDA0kNHQTdj0MnaM_qJsLpJh6j9_u7t9ljOn95eJpdz9OK8bxPDTVWGpZVXJJCAM9kRrnMDAOaZ4YLILoghBSWYc60LhjNcxAkszk3mgjOjtHVmncxlC2Yatwy6EYtgmt1-FZeO_W_07kPVfulIpTSvBByZLjYMAT_NYznq9bFCppGd-CHqBilUrCcsxX0_K_YVuX3kSNgsgZUwccYwG4hBKuVWWptltqYxX4Avt-IqA
Cites_doi 10.1186/1758-2946-1-8
10.1186/s13321-025-00946-0
10.1038/nchem.1243
10.1038/s41573-019-0024-5
10.1016/j.chembiol.2003.09.002
10.1021/acs.jcim.1c00203
10.1021/acscentsci.3c00572
10.1186/s13321-020-00429-4
10.3390/molecules28010175
10.1021/acs.jcim.0c00411
10.1039/D1SC05976A
10.1007/s40747-024-01369-4
ContentType Journal Article
Copyright The Author(s) 2025. Published by Oxford University Press.
The Author(s) 2025. Published by Oxford University Press. 2025
Copyright_xml – notice: The Author(s) 2025. Published by Oxford University Press.
– notice: The Author(s) 2025. Published by Oxford University Press. 2025
DBID AAYXX
CITATION
NPM
7X8
5PM
DOI 10.1093/bioadv/vbaf137
DatabaseName CrossRef
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2635-0041
ExternalDocumentID PMC12228967
40621602
10_1093_bioadv_vbaf137
Genre Journal Article
GroupedDBID 0R~
AAYXX
ABEJV
ABGNP
ABXVV
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AMNDL
BBNVY
BENPR
BHPHI
CCPQU
CITATION
GROUPED_DOAJ
HCIFZ
M7P
M~E
OK1
PHGZM
PHGZT
PIMPY
PQGLB
RPM
TOX
ZCN
ABDBF
NPM
7X8
5PM
ID FETCH-LOGICAL-c348t-d2df7d35c47196e45752475d3e285d46e1a91119f3043aa93288e615f84da1643
ISSN 2635-0041
IngestDate Thu Aug 21 18:22:22 EDT 2025
Fri Sep 05 15:43:19 EDT 2025
Mon Jul 21 06:02:37 EDT 2025
Thu Jul 10 08:40:31 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://creativecommons.org/licenses/by/4.0
The Author(s) 2025. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c348t-d2df7d35c47196e45752475d3e285d46e1a91119f3043aa93288e615f84da1643
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-9215-694X
0009-0009-4310-0259
OpenAccessLink http://dx.doi.org/10.1093/bioadv/vbaf137
PMID 40621602
PQID 3227638437
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_12228967
proquest_miscellaneous_3227638437
pubmed_primary_40621602
crossref_primary_10_1093_bioadv_vbaf137
PublicationCentury 2000
PublicationDate 2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics advances
PublicationTitleAlternate Bioinform Adv
PublicationYear 2025
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Zhang (2025070603362509700_vbaf137-B33) 2023
Ferla (2025070603362509700_vbaf137-B10) 2025; 17
Guan (2025070603362509700_vbaf137-B14) 2024
Ziegler (2025070603362509700_vbaf137-B37) 2019
Black (2025070603362509700_vbaf137-B4) 2023
Schulman (2025070603362509700_vbaf137-B27) 2017
Schneuing (2025070603362509700_vbaf137-B26)
Peng (2025070603362509700_vbaf137-B22) 2022
Powers (2025070603362509700_vbaf137-B23) 2023; 9
Lin (2025070603362509700_vbaf137-B18) 2022
Ouyang (2025070603362509700_vbaf137-B21) 2022; 35
Eberhardt (2025070603362509700_vbaf137-B8) 2021; 61
Liu (2025070603362509700_vbaf137-B19) 2022
Wallace (2025070603362509700_vbaf137-B30) 2024
Rafailov (2025070603362509700_vbaf137-B24) 2024; 36
Luo (2025070603362509700_vbaf137-B20) 2021; 34
Zhang (2025070603362509700_vbaf137-B32) 2024.
Zhou (2025070603362509700_vbaf137-B35) 2024
Guan (2025070603362509700_vbaf137-B13) 2023
Dong (2025070603362509700_vbaf137-B7) 2023
Li (2025070603362509700_vbaf137-B17) 2024; 10
Blanes-Mira (2025070603362509700_vbaf137-B5) 2022; 28
Beirami (2025070603362509700_vbaf137-B2) 2024
Gui (2025070603362509700_vbaf137-B15) 2024
Christiano (2025070603362509700_vbaf137-B6) 2017; 30
Bickerton (2025070603362509700_vbaf137-B3) 2012; 4
Zhang (2025070603362509700_vbaf137-B34) 2023
Zhou (2025070603362509700_vbaf137-B36) 2024
Yazdani-Jahromi (2025070603362509700_vbaf137-B31) 2024; 37
Spiegel (2025070603362509700_vbaf137-B28) 2020; 12
Ertl (2025070603362509700_vbaf137-B9) 2009; 1
Fu (2025070603362509700_vbaf137-B12) 2022; 35
Jin (2025070603362509700_vbaf137-B16) 2020
Vamathevan (2025070603362509700_vbaf137-B29) 2019; 18
Anderson (2025070603362509700_vbaf137-B1) 2003; 10
Francoeur (2025070603362509700_vbaf137-B11) 2020; 60
Ragoza (2025070603362509700_vbaf137-B25) 2022; 13
References_xml – volume: 34
  start-page: 6229
  year: 2021
  ident: 2025070603362509700_vbaf137-B20
  article-title: A 3D generative model for structure-based drug design
  publication-title: Adv Neural Inf Process Syst
– volume: 37
  start-page: 105780
  year: 2024
  ident: 2025070603362509700_vbaf137-B31
  article-title: Fair bilevel neural network (FairBiNN): on balancing fairness and accuracy via Stackelberg equilibrium
  publication-title: Adv Neural Inf Process Syst
– volume: 1
  start-page: 8
  year: 2009
  ident: 2025070603362509700_vbaf137-B9
  article-title: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions
  publication-title: J Cheminform
  doi: 10.1186/1758-2946-1-8
– volume: 17
  start-page: 4
  year: 2025
  ident: 2025070603362509700_vbaf137-B10
  article-title: Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology
  publication-title: J Cheminform
  doi: 10.1186/s13321-025-00946-0
– volume: 35
  start-page: 12325
  year: 2022
  ident: 2025070603362509700_vbaf137-B12
  article-title: Reinforced genetic algorithm for structure-based drug design
  publication-title: Adv Neural Inf Process Syst
– volume: 35
  start-page: 27730
  year: 2022
  ident: 2025070603362509700_vbaf137-B21
  article-title: Training language models to follow instructions with human feedback
  publication-title: Adv Neural Inf Process Syst
– year: 2023
  ident: 2025070603362509700_vbaf137-B34
– year: 2023
  ident: 2025070603362509700_vbaf137-B4
– year: 2022
  ident: 2025070603362509700_vbaf137-B19
– year: 2017
  ident: 2025070603362509700_vbaf137-B27
– year: 2024.
  ident: 2025070603362509700_vbaf137-B32
– year: 2023
  ident: 2025070603362509700_vbaf137-B13
– year: 2023
  ident: 2025070603362509700_vbaf137-B7
– volume: 4
  start-page: 90
  year: 2012
  ident: 2025070603362509700_vbaf137-B3
  article-title: Quantifying the chemical beauty of drugs
  publication-title: Nat Chem
  doi: 10.1038/nchem.1243
– start-page: 17644
  year: 2022
  ident: 2025070603362509700_vbaf137-B22
– volume: 18
  start-page: 463
  year: 2019
  ident: 2025070603362509700_vbaf137-B29
  article-title: Applications of machine learning in drug discovery and development
  publication-title: Nat Rev Drug Discov
  doi: 10.1038/s41573-019-0024-5
– volume: 10
  start-page: 787
  year: 2003
  ident: 2025070603362509700_vbaf137-B1
  article-title: The process of structure-based drug design
  publication-title: Chem Biol
  doi: 10.1016/j.chembiol.2003.09.002
– year: 2024
  ident: 2025070603362509700_vbaf137-B36
– volume: 61
  start-page: 3891
  year: 2021
  ident: 2025070603362509700_vbaf137-B8
  article-title: Autodock vina 1.2. 0: new docking methods, expanded force field, and python bindings
  publication-title: J Chem Inf Model
  doi: 10.1021/acs.jcim.1c00203
– year: 2024
  ident: 2025070603362509700_vbaf137-B2
– year: 2024
  ident: 2025070603362509700_vbaf137-B35
– volume: 9
  start-page: 2257
  year: 2023
  ident: 2025070603362509700_vbaf137-B23
  article-title: Geometric deep learning for structure-based ligand design
  publication-title: ACS Cent Sci
  doi: 10.1021/acscentsci.3c00572
– volume: 12
  start-page: 25
  year: 2020
  ident: 2025070603362509700_vbaf137-B28
  article-title: Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization
  publication-title: J Cheminform
  doi: 10.1186/s13321-020-00429-4
– volume: 30
  year: 2017
  ident: 2025070603362509700_vbaf137-B6
  article-title: Deep reinforcement learning from human preferences
  publication-title: Adv Neural Inf process Syst
– volume: 28
  start-page: 175
  year: 2022
  ident: 2025070603362509700_vbaf137-B5
  article-title: Comprehensive survey of consensus docking for high-throughput virtual screening
  publication-title: Molecules
  doi: 10.3390/molecules28010175
– volume: 60
  start-page: 4200
  year: 2020
  ident: 2025070603362509700_vbaf137-B11
  article-title: Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design
  publication-title: J Chem Inf Model
  doi: 10.1021/acs.jcim.0c00411
– year: 2024
  ident: 2025070603362509700_vbaf137-B15
– start-page: 899
  ident: 2025070603362509700_vbaf137-B26
– year: 2024
  ident: 2025070603362509700_vbaf137-B14
– volume: 13
  start-page: 2701
  year: 2022
  ident: 2025070603362509700_vbaf137-B25
  article-title: Generating 3D molecules conditional on receptor binding sites with deep generative models
  publication-title: Chem Sci
  doi: 10.1039/D1SC05976A
– start-page: 8228
  year: 2024
  ident: 2025070603362509700_vbaf137-B30
– start-page: 41382
  year: 2023
  ident: 2025070603362509700_vbaf137-B33
– year: 2022
  ident: 2025070603362509700_vbaf137-B18
– volume: 36
  start-page: 53728
  year: 2024
  ident: 2025070603362509700_vbaf137-B24
  article-title: Direct preference optimization: your language model is secretly a reward model
  publication-title: Adv Neural Inf Process Syst
– volume: 10
  start-page: 4421
  year: 2024
  ident: 2025070603362509700_vbaf137-B17
  article-title: A prediction method for dynamic multiobjective optimization based on joint subspace and correlation alignment
  publication-title: Complex Intell Syst
  doi: 10.1007/s40747-024-01369-4
– start-page: 4849
  year: 2020
  ident: 2025070603362509700_vbaf137-B16
– year: 2019
  ident: 2025070603362509700_vbaf137-B37
SSID ssj0002776143
Score 2.2781584
Snippet Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion...
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage vbaf137
SubjectTerms Original
Title BoKDiff: best-of-K diffusion alignment for target-specific 3D molecule generation
URI https://www.ncbi.nlm.nih.gov/pubmed/40621602
https://www.proquest.com/docview/3227638437
https://pubmed.ncbi.nlm.nih.gov/PMC12228967
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZgE2gvE3c6oAoSEg-VWWLnyhujg2nTxkWbNJ4iO7ZJ0JpMXYtEfz3HduK065CAFyuKLUc55-j4XD8j9EoRnyuw2jDNggKHMhCYURJhoaIoIQURmdTNyccn8cFZeHgenfelvKa7ZMbfFIsb-0r-h6vwDviqu2T_gbNuU3gBz8BfGIHDMP4Vj_eao7GGVQSvnoN2x43CR-bKk_mVqTK-qL7bZL-pJTQ131i3VuryoBEdjyb2blyp71GW055HXZK3alpcVYvlbMsF-uxP2QjGdRC6HH1jIE1MVLZnpnK6hC0Eqyt8yMppMzGzx7IUbv4jeOqcGT5_WlzICYyTtjetjUSQaCkSIY3G0sA2WCN4LavXaE2KrKr8yZkKLODLmhq3EFe8auDP4OGGpUDZy4lhK1gkJIh90h9orsywm7qNNkmSmCx-F8z5YXKuCRgn1EF50l37xd32e1vobrfDqtWy5opcr6hdMlFO76Ht1rfw3llBuY9uyfoBumNvG_31EH1pxeWt54TFc8LiOWHxgOPeNWHx6NjrhMXrheUROvuwf_r-ALc3auCChukMCyJUImhUgEmSxTIEW52ESSSoJGkkwlgGTB9-maJ-SBkD2z5NJdi8Kg0FA8eaPkYbdVPLp8gTKiuARhl4aDLkKk1VrNHiZMr8wpcZH6DXHcXySwucktuCB5pbMuctmQfoZUfQHHSbTlixWjbzqxwOGzj-0lCveWIJ7PbqODNA6Qrp3QKNm746U1elwU8PdNQzi5OdP276DG31Av4cbcymc_kCjM8ZH6LNvf2Tz1-HJngzNAL1G_BGi-U
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BoKDiff%3A+best-of-K+diffusion+alignment+for+target-specific+3D+molecule+generation&rft.jtitle=Bioinformatics+advances&rft.au=Khodabandeh+Yalabadi%2C+Ali&rft.au=Yazdani-Jahromi%2C+Mehdi&rft.au=Garibay%2C+Ozlem+Ozmen&rft.date=2025-01-01&rft.eissn=2635-0041&rft.volume=5&rft.issue=1&rft.spage=vbaf137&rft_id=info:doi/10.1093%2Fbioadv%2Fvbaf137&rft_id=info%3Apmid%2F40621602&rft.externalDocID=40621602
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2635-0041&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2635-0041&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2635-0041&client=summon