Matched pairs demonstrate robustness against inter-assay variability

Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between comp...

Full description

Saved in:
Bibliographic Details
Published inJournal of cheminformatics Vol. 17; no. 1; p. 8
Main Authors Nelen, Jochem, Pérez-Sánchez, Horacio, De Winter, Hans, Van Rompaey, Dries
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 20.01.2025
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K i and IC 50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
AbstractList Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K i and IC 50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K.sub.i and IC.sub.50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K and IC values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K.sub.i and IC.sub.50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Keywords: Matched structural pairs, Assay noise, Data curation, ChEMBL, Machine learning
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for Ki and IC50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for Ki and IC50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
ArticleNumber 8
Audience Academic
Author De Winter, Hans
Pérez-Sánchez, Horacio
Van Rompaey, Dries
Nelen, Jochem
Author_xml – sequence: 1
  givenname: Jochem
  surname: Nelen
  fullname: Nelen, Jochem
  organization: Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), HiTech Innovation Hub, UCAM Universidad Católica de Murcia, Health Sciences PhD Program, Universidad Católica de Murcia UCAM
– sequence: 2
  givenname: Horacio
  surname: Pérez-Sánchez
  fullname: Pérez-Sánchez, Horacio
  organization: Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), HiTech Innovation Hub, UCAM Universidad Católica de Murcia
– sequence: 3
  givenname: Hans
  surname: De Winter
  fullname: De Winter, Hans
  email: hans.dewinter@uantwerpen.be
  organization: Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp
– sequence: 4
  givenname: Dries
  surname: Van Rompaey
  fullname: Van Rompaey, Dries
  organization: Drug Discovery Data Sciences, Janssen Pharmaceutica NV
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39833966$$D View this record in MEDLINE/PubMed
BookMark eNp9kltv1DAQhSNURC_wB3hAkXihDym-xJc8oaoUWKkIicuzNXEmqVfZeLGdiv33uLul7SKE8pBo8p1jz8w5Lg4mP2FRvKTkjFIt30bKOaMVYaIipBGy2jwpjqgSumJNLQ8efR8WxzEuCZFCEfWsOOSN5ryR8qh4_xmSvcauXIMLsexw5aeYAiQsg2_nmCaMsYQBXC6XbkoYKogRNuUNBAetG13aPC-e9jBGfHH3Pil-fLj8fvGpuvrycXFxflVZwXiqkCNvW4GKCKyVxY6KGjg2LbCO8Y6jUlb1PF9TMRRW9xpE3TaSqFaJhiE_KRY7387D0qyDW0HYGA_ObAs-DAZCcnZEA9C3kjEru4bUDZHAte6AQs25ajtOs9e7ndd6blfYWZxy1-Oe6f6fyV2bwd8YSlWtdS2yw5s7h-B_zhiTWblocRxhQj9Hw6lQQtSyIRl9_Re69HOY8qy2VF6jJuKBGiB34Kbe54Ptrak516zmtdK0ydTZP6j85N05mxPSu1zfE5zuCTKT8FcaYI7RLL593WdfPZ7K_Tj-BCYDbAfY4GMM2N8jlJjbVJpdKk1Opdmm0myyiO9EMcPTgOGh_f-ofgPxKOLx
Cites_doi 10.1021/jm500317a
10.1186/s13321-023-00769-x
10.1093/comjnl/45.6.631
10.1021/acs.jcim.4c00049
10.1039/D4MD00325J
10.1093/nar/gky1075
10.1016/j.ejpb.2006.06.005
10.1016/j.drudis.2009.01.012
10.1016/j.drudis.2016.03.015
ContentType Journal Article
Copyright The Author(s) 2025
2025. The Author(s).
COPYRIGHT 2025 BioMed Central Ltd.
Copyright Springer Nature B.V. Dec 2025
The Author(s) 2025 2025
Copyright_xml – notice: The Author(s) 2025
– notice: 2025. The Author(s).
– notice: COPYRIGHT 2025 BioMed Central Ltd.
– notice: Copyright Springer Nature B.V. Dec 2025
– notice: The Author(s) 2025 2025
DBID C6C
AAYXX
CITATION
NPM
ISR
3V.
7QO
7X7
7XB
8AO
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ARAPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
CCPQU
D1I
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
KB.
LK8
M0S
M7P
P5Z
P62
P64
PDBOC
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
7X8
5PM
DOA
DOI 10.1186/s13321-025-00956-y
DatabaseName Springer Nature OA Free Journals
CrossRef
PubMed
Gale In Context: Science
ProQuest Central (Corporate)
Biotechnology Research Abstracts
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest One Sustainability
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Technology Collection
Natural Science Collection
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central
Engineering Research Database
ProQuest Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
ProQuest SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Materials Science Database
Biological Sciences
ProQuest Health & Medical Collection
Biological Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
PubMed
Publicly Available Content Database
ProQuest Central Student
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Materials Science Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
Materials Science Database
ProQuest Central (New)
ProQuest Materials Science Collection
Advanced Technologies & Aerospace Collection
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Advanced Technologies & Aerospace Database
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList

Publicly Available Content Database


PubMed


MEDLINE - Academic
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ (Directory of Open Access Journals)
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 4
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
EISSN 1758-2946
EndPage 8
ExternalDocumentID oai_doaj_org_article_aafb622c6d904906a388da1a4337bd31
PMC11748845
A824347819
39833966
10_1186_s13321_025_00956_y
Genre Journal Article
GrantInformation_xml – fundername: Cátedra Villapharma-UCAM
GroupedDBID -5F
-5G
-A0
-BR
0R~
29K
2WC
3V.
4.4
40G
53G
5VS
7X7
8AO
8FE
8FG
8FH
8FI
8FJ
AAFWJ
AAJSJ
AAKKN
AAKPC
ABDBF
ABEEZ
ABJCF
ABUWG
ACACY
ACGFS
ACIHN
ACIWK
ACPRK
ACUHS
ACULB
ADBBV
ADINQ
ADRAZ
ADUKV
AEAQA
AENEX
AEUYN
AFGXO
AFKRA
AFRAH
AHBYD
AHMBA
AHYZX
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIJS
ARAPS
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BGLVJ
BHPHI
BMC
BPHCQ
BVXVI
C24
C6C
CCPQU
D-I
D1I
DIK
E3Z
EBLON
EBS
ESX
F5P
FRP
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
IAO
IGS
IHR
ISR
ITC
KB.
KQ8
LK8
M48
M7P
MK0
M~E
O5R
O5S
OK1
P62
PDBOC
PGMZT
PIMPY
PQQKQ
PROAC
RBZ
RNS
RPM
RSV
RVI
SOJ
SPH
TR2
TUS
U2A
UKHRP
AASML
AAYXX
AFPKN
CITATION
PHGZM
PHGZT
NPM
PMFND
7QO
7XB
8FD
8FK
AZQEC
DWQXO
FR3
GNUQQ
K9.
P64
PKEHL
PQEST
PQGLB
PQUKI
PRINS
7X8
5PM
PUEGO
ID FETCH-LOGICAL-c523t-e3e3bb5e705e47ced154a3e9ba2d23d3e77c7f300672e5c8f8a54b9607b7592e3
IEDL.DBID M48
ISSN 1758-2946
IngestDate Wed Aug 27 01:30:44 EDT 2025
Thu Aug 21 18:41:18 EDT 2025
Thu Jul 10 22:08:44 EDT 2025
Mon Jul 28 15:40:53 EDT 2025
Tue Jun 17 22:00:03 EDT 2025
Tue Jun 10 20:55:08 EDT 2025
Fri Jun 27 05:15:09 EDT 2025
Thu Jan 30 12:30:00 EST 2025
Tue Jul 01 03:49:27 EDT 2025
Fri Feb 21 02:37:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Data curation
ChEMBL
Matched structural pairs
Assay noise
Machine learning
Language English
License 2025. The Author(s).
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c523t-e3e3bb5e705e47ced154a3e9ba2d23d3e77c7f300672e5c8f8a54b9607b7592e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.1186/s13321-025-00956-y
PMID 39833966
PQID 3157332805
PQPubID 54992
ParticipantIDs doaj_primary_oai_doaj_org_article_aafb622c6d904906a388da1a4337bd31
pubmedcentral_primary_oai_pubmedcentral_nih_gov_11748845
proquest_miscellaneous_3157554690
proquest_journals_3157332805
gale_infotracmisc_A824347819
gale_infotracacademiconefile_A824347819
gale_incontextgauss_ISR_A824347819
pubmed_primary_39833966
crossref_primary_10_1186_s13321_025_00956_y
springer_journals_10_1186_s13321_025_00956_y
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-01-20
PublicationDateYYYYMMDD 2025-01-20
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-20
  day: 20
PublicationDecade 2020
PublicationPlace Cham
PublicationPlace_xml – name: Cham
– name: England
– name: London
PublicationTitle Journal of cheminformatics
PublicationTitleAbbrev J Cheminform
PublicationTitleAlternate J Cheminform
PublicationYear 2025
Publisher Springer International Publishing
BioMed Central Ltd
Springer Nature B.V
BMC
Publisher_xml – name: Springer International Publishing
– name: BioMed Central Ltd
– name: Springer Nature B.V
– name: BMC
References C Kramer (956_CR5) 2014; 57
GA Landrum (956_CR1) 2024; 64
D Mendez (956_CR2) 2019; 47
Z Fralish (956_CR7) 2023; 15
C Kramer (956_CR3) 2016; 21
Z Fralish (956_CR8) 2024
SP Brown (956_CR4) 2009; 14
956_CR6
956_CR9
JJ Palmgrén (956_CR11) 2006; 64
JW Raymond (956_CR10) 2002; 45
References_xml – volume: 57
  start-page: 3786
  year: 2014
  ident: 956_CR5
  publication-title: J Med Chem
  doi: 10.1021/jm500317a
– volume: 15
  start-page: 101
  year: 2023
  ident: 956_CR7
  publication-title: J Chemin
  doi: 10.1186/s13321-023-00769-x
– volume: 45
  start-page: 631
  year: 2002
  ident: 956_CR10
  publication-title: Comput J
  doi: 10.1093/comjnl/45.6.631
– volume: 64
  start-page: 1560
  year: 2024
  ident: 956_CR1
  publication-title: J Chem Inf Model
  doi: 10.1021/acs.jcim.4c00049
– ident: 956_CR6
– year: 2024
  ident: 956_CR8
  publication-title: RSC Med Chem
  doi: 10.1039/D4MD00325J
– volume: 47
  start-page: D930
  year: 2019
  ident: 956_CR2
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky1075
– ident: 956_CR9
– volume: 64
  start-page: 369
  year: 2006
  ident: 956_CR11
  publication-title: Eur J Pharm Biopharm
  doi: 10.1016/j.ejpb.2006.06.005
– volume: 14
  start-page: 420
  year: 2009
  ident: 956_CR4
  publication-title: Drug Discov Today
  doi: 10.1016/j.drudis.2009.01.012
– volume: 21
  start-page: 1213
  year: 2016
  ident: 956_CR3
  publication-title: Drug Discov Today
  doi: 10.1016/j.drudis.2016.03.015
SSID ssj0065707
Score 2.3453336
Snippet Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful...
Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without...
SourceID doaj
pubmedcentral
proquest
gale
pubmed
crossref
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 8
SubjectTerms Assay noise
Assaying
Brief Report
ChEMBL
Chemistry
Chemistry and Materials Science
Computational Biology/Bioinformatics
Computer Applications in Chemistry
Data curation
Datasets
Documentation and Information in Chemistry
Error reduction
Impact analysis
Information management
Machine learning
Matched structural pairs
Metadata
Noise reduction
Quality assessment
Quality control
Theoretical and Computational Chemistry
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nb9QwELVQL3BBlM9AQQEhcYCoie3EzrEUqoJUDkCl3qyxMyk9kK02u5X23zPjJEtThLgg5RRPlOSNP-bJM89CvC5ksE0FOrNKMUEpVAYGMWtQ27xuDQbgeueTL9Xxqf58Vp5dO-qLc8IGeeABuH2A1ldShqqpeZOqAmVtAwVopYxvYgW1pDVvIlPDHMz5HGYqkbHVfk9MTBJtlmWWR-W9zWwZimr9f87J1xalmwmTN3ZN42J0dE_cHaPI9GD4-l1xC7v74vbhdHjbA_HhBNgdTXrJ-zVpgz85DmRZiHS58Ot-xTNcCudwQbdT1oxYZhRGwya9IvI8aHdvHorTo4_fD4-z8cCELBCfXGWoUHlfoslL1CZgQ_ERKKw9yEaqRqExwbQqbr9iGWxrodSeOIzxpqwlqkdip1t0-ESkQEQGTFGXZKkJbmt8XmAVNNClERLxdsLPXQ66GC7yCVu5AW1HaLuIttsk4j1DvLVkTet4gzztRk-7f3k6Ea_YQY5VKzpOizmHdd-7T9--ugMrteKi2ToRb0ajdkGoBhirDOivWOhqZrk3syQHhXnz1A_cOKx7pwqWj5Q2LxPxctvMT3KqWoeL9WDDqX91nojHQ7fZ_reqaUgQwUyEnXWoGTDzlu7iRxT9Log6Wqvpxe-mvvf7u_6O_NP_gfwzcUfGsVPQpLondlbLNT6nWGzlX8Rh9wvrvS_d
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Health & Medical Collection
  dbid: 7X7
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagHOCCeBMoKCAkDhA1sZ3YOaFSqApSOQCV9maNH7vtgWTZ7CLtv2fGSbakCKSc4okSj2fG_jLjz4y9KrjTvgKZaSEIoBQiAxVC5oPUeT1XwQHtdz79Up2cyc-zcjb8cOuGssoxJsZA7VtH_8gPREHMfVzn5bvlz4xOjaLs6nCExnV2g6jLqKRLzXaAi6o61LhRRlcHHeIxjuCZl1ke-fe2k8kocvb_HZn_mJqulk1eyZ3GKen4Drs9rCXTw37w77JrobnHbh6NR7jdZx9OgQbFp0vK2qQ-_KDVIJFDpKvWbro1xbkUFnCBt1NijlhluJiGbfoLIXTP4L19wM6OP34_OsmGYxMyh6hynQURhLVlUHkZpHLB4yoJRKgtcM-FF0Epp-YiJmFD6fRcQyktIhllVVnzIB6yvaZtwmOWAsIZUKhZlJRQgFY2L0LlJOAlAyTszag_s-zZMUxEFboyvbYNattEbZttwt6TineSxGwdb7SrhRkcxQDMbcW5q3xNSckKhNYeXy2FUNaLImEvaYAMcVc0VByzgE3XmU_fvppDzaWgrbN1wl4PQvMWtepg2GuAvSK6q4nk_kQSB8hNm0c7MINzd-bSFBP2YtdMT1LBWhPaTS9DBYB1nrBHvdns-i1qdAyEmQnTE4OaKGba0lycR-rvAgGk1hJf_Ha0vcvv-rfmn_y_G0_ZLR69osCguc_21qtNeIZrrbV9Hh3qNycKJ1w
  priority: 102
  providerName: ProQuest
– databaseName: SpringerLink Open Access Journals
  dbid: C24
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELagHOCCeJNSUEBIHCAi8SN2jmWhKkjlAFTqzRo7ztIDSbXZrbT_nhknWUiBA9Ke4omyGc-M59PMfGHsZcG9qUuQmRGCAEohMtAhZHWQJq8aHTzQvPPJ5_L4VH46U2fjUFg_dbtPJckYqaNbm_Jtj2iKI_TlKssje972OruhELuTXS9oxmGIv9TLoafxmL_eNzuCIlP_n_H4twPparPklYppPIiO7rDbYwaZHg5bfpddC-09dnMxfbjtPnt_ArQVdXpBtZq0Dj8oByRKiHTVuU2_puiWwhLO8XJKfBGrDFNo2KaXCJwH3u7tA3Z69OHb4jgbP5aQecSS6yyIIJxTQecqSO1DjbkRiFA54DUXtQhae92IWHoNypvGgJIO8Yt2WlU8iIdsr-3a8JilgCAGdFEplJRQgNEuL0LpJeBPBkjY60l_9mLgxLARS5jSDtq2qG0btW23CXtHKt5JEp91vNCtlnZ0DwvQuJJzX9YVlSJLEMbU-GgphHa1KBL2gjbIEmNFSy0xS9j0vf349Ys9NFwKGpitEvZqFGo61KqHccIA34pIrmaSBzNJ3CA_X57swI4u3VtREHUkN7lK2PPdMt1JbWpt6DaDDLX9VXnCHg1ms3tvUaE7ILhMmJkZ1Ewx85X2_Hsk_C4QNhoj8cFvJtv79b_-rfn9_xN_wm7x6CUFhs4DtrdebcJTzLjW7ll0sJ8IDSSg
  priority: 102
  providerName: Springer Nature
Title Matched pairs demonstrate robustness against inter-assay variability
URI https://link.springer.com/article/10.1186/s13321-025-00956-y
https://www.ncbi.nlm.nih.gov/pubmed/39833966
https://www.proquest.com/docview/3157332805
https://www.proquest.com/docview/3157554690
https://pubmed.ncbi.nlm.nih.gov/PMC11748845
https://doaj.org/article/aafb622c6d904906a388da1a4337bd31
Volume 17
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELf28QAviG8CowoIiQcIJLYTOw8IdWXdQOqEBpX6Zl0St0yCZKQtov89d05SyBgSUnVq40uTnO_s-8X2z4w9i3iuiwRkoIUggBKJAJS1QWGlDtO5sjnQeufJaXIylR9m8WyHddsdtQZcXgntaD-paf311c_vm7cY8G9cwOvk9RJxFkdQzOMgdLx6m122jz2TokCdyO2oAs3ycJutYI4c8FQm3SKaK_-j11E5Pv-_W-0_uq3LUyovjau67mp8k91o80x_2DjGLbZjy9vs2qjb3u0OezcBqrDCv6ARHb-w3yhTJOIIv66y9XJFbaAPCzjHwz6xStQBJtqw8X8gvG7YvTd32XR89Hl0ErRbKgQ5Is5VYIUVWRZbFcZWqtwWmEGBsGkGvOCiEFapXM2FG6C1ca7nGmKZIcpRmYpTbsU9tldWpX3AfECoAypKY9SUEIFWWRjZJJeAH2nBYy86-5mLhjnDOMShE9NY26C1jbO22XjskEy81STWa3egqhemDSIDMM8SzvOkSGnAMgGhdYGXlkKorBCRx55SBRnitShp4swC1sulef_pzAw1l4KW1aYee94qzSu0ag7tOgR8KqLC6mke9DSxgvJ-cecHpvNbIyIimOQ6jD32ZFtMZ9JkttJW60aHJgemocfuN26zfW6RYtAgBPWY7jlUzzD9kvL8i6MFjxBcai3xwi873_t9X_-2_MP_Mdojdp272IiwWT1ge6t6bR9jNrbKBmxXhsco1Uyh1GP8vn94dPrxDH-NuCSZjAbuXQfK41k0cEGJcsqHvwADhTds
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZKOZQL4k2gQEAgDhA1sZ3YOSBUWqpd2u0BWmlvrmN7lx5Ils0uaP8Uv5EZJ9mSIrhVyimeTeLxzHi-nYcJeZlQI22meSQZQ4CSsEgL5yLruIzziXBGY73z6DgbnPJP43S8QX51tTCYVtnZRG-obWXwP_IdlmDnPirj9P3se4SnRmF0tTtCoxGLQ7f6CZCtfjfch_V9RenBx5O9QdSeKhAZAF2LyDHHiiJ1Ik4dF8ZZcCI0c3mhqaXMMieEERPmY5QuNXIidcoLcPRFIdKcOgbPvUauw8Ybo0aJ8RrgYRaJ6ApzZLZTA_6jANZpGsW-39-qt_n5MwL-3gn-2Aovp2leitX6LfDgFrnZ-q7hbiNst8mGK--Qrb3uyLi7ZH-kUQhsOMMoUWjdN_Q-sRlFOK-KZb1AuxrqqT6H2yF2qphH4LzrVfgDIHvTMXx1j5xeCUPvk82yKt1DEmqAT1okeQqUXCdaiiJOXGa4hos7HZA3Hf_UrOnGoTyKkZlquK2A28pzW60C8gFZvKbETtr-RjWfqlYxldaTIqPUZDbHIGimmZQWXs0ZE4VlSUBe4AIp7JVRYjLOVC_rWg2_fFa7knKGpbp5QF63RJMKuGp0W9sAs8L2Wj3K7R4lLJDpD3dyoFpjUqsL0Q_I8_Uw_hIT5EpXLRsaTDjM44A8aMRmPW-WgyICrA2I7AlUjzH9kfL8q281ngBglZLDi992snfxXf_m_KP_T-MZ2RqcjI7U0fD48DG5Qb2GJGCwt8nmYr50T8DPWxRPvXKF5Oyqtfk3yzZkaw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZKkYAL4k2gQEAgDhBtYjuxc0CodFl1Ka0QUGlvxrGdpQeSZbMLyl_j1zGTx5YUwa1STvFsEo9nxvPtPEzI04gaaRPNA8kYApSIBVo4F1jHZZjmwhmN9c6HR8n-MX83i2db5FdfC4Nplb1NbAy1LQ3-Rz5iEXbuozKMR3mXFvFhPHm9-B7gCVIYae2P02hF5MDVPwG-Va-mY1jrZ5RO3n7e2w-6EwYCAwBsFTjmWJbFToSx48I4Cw6FZi7NNLWUWeaEMCJnTbzSxUbmUsc8A6dfZCJOqWPw3AvkomBxhDomZhuwhxkloi_SkcmoAixIAbjTOAib3n_1YCNszgv4e1f4Y1s8m7J5Jm7bbIeTa-Rq58f6u63gXSdbrrhBLu_1x8fdJONDjQJh_QVGjHzrvqEnio0p_GWZrasV2lhfz_UJ3Paxa8UyAEde1_4PgO9t9_D6Fjk-F4beJttFWbi7xNcApbSI0hgouY60FFkYucRwDRd32iMvev6pRduZQzWIRiaq5bYCbquG26r2yBtk8YYSu2o3N8rlXHVKqrTOs4RSk9gUA6KJZlJaeDVnTGSWRR55gguksG9GgRI41-uqUtNPH9WupJxh2W7qkecdUV4CV43u6hxgVthqa0C5M6CEBTLD4V4OVGdYKnWqBh55vBnGX2KyXOHKdUuDyYdp6JE7rdhs5s1SUEqAuB6RA4EaMGY4Upx8bdqORwBepeTw4pe97J1-1785f-__03hELoEeq_fTo4P75AptFCQC271DtlfLtXsALt8qe9jolk--nLcy_wZ7x2ih
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Matched+pairs+demonstrate+robustness+against+inter-assay+variability&rft.jtitle=Journal+of+cheminformatics&rft.au=Nelen%2C+Jochem&rft.au=P%C3%A9rez-S%C3%A1nchez%2C+Horacio&rft.au=De+Winter%2C+Hans&rft.au=Van+Rompaey%2C+Dries&rft.date=2025-01-20&rft.pub=BioMed+Central+Ltd&rft.issn=1758-2946&rft.eissn=1758-2946&rft.volume=17&rft.issue=1&rft_id=info:doi/10.1186%2Fs13321-025-00956-y&rft.externalDBID=ISR&rft.externalDocID=A824347819
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1758-2946&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1758-2946&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1758-2946&client=summon