Matched pairs demonstrate robustness against inter-assay variability

Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between comp...

Full description

Saved in:

Bibliographic Details
Published in	Journal of cheminformatics Vol. 17; no. 1; p. 8
Main Authors	Nelen, Jochem, Pérez-Sánchez, Horacio, De Winter, Hans, Van Rompaey, Dries
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 20.01.2025 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Assay noise Assaying Brief Report ChEMBL Chemistry Chemistry and Materials Science Computational Biology/Bioinformatics Computer Applications in Chemistry Data curation Datasets Documentation and Information in Chemistry Error reduction Impact analysis Information management Machine learning Matched structural pairs Metadata Noise reduction Quality assessment Quality control Theoretical and Computational Chemistry Data curation ChEMBL Matched structural pairs Assay noise Machine learning
Online Access	Get full text

Cover

Loading…

Abstract	Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K i and IC 50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
AbstractList	Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K i and IC 50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K.sub.i and IC.sub.50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K and IC values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K.sub.i and IC.sub.50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Keywords: Matched structural pairs, Assay noise, Data curation, ChEMBL, Machine learning Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for Ki and IC50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for Ki and IC50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.
ArticleNumber	8
Audience	Academic
Author	De Winter, Hans Pérez-Sánchez, Horacio Van Rompaey, Dries Nelen, Jochem
Author_xml	– sequence: 1 givenname: Jochem surname: Nelen fullname: Nelen, Jochem organization: Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), HiTech Innovation Hub, UCAM Universidad Católica de Murcia, Health Sciences PhD Program, Universidad Católica de Murcia UCAM – sequence: 2 givenname: Horacio surname: Pérez-Sánchez fullname: Pérez-Sánchez, Horacio organization: Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), HiTech Innovation Hub, UCAM Universidad Católica de Murcia – sequence: 3 givenname: Hans surname: De Winter fullname: De Winter, Hans email: hans.dewinter@uantwerpen.be organization: Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp – sequence: 4 givenname: Dries surname: Van Rompaey fullname: Van Rompaey, Dries organization: Drug Discovery Data Sciences, Janssen Pharmaceutica NV
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/39833966$$D View this record in MEDLINE/PubMed
BookMark	eNp9kltv1DAQhSNURC_wB3hAkXihDym-xJc8oaoUWKkIicuzNXEmqVfZeLGdiv33uLul7SKE8pBo8p1jz8w5Lg4mP2FRvKTkjFIt30bKOaMVYaIipBGy2jwpjqgSumJNLQ8efR8WxzEuCZFCEfWsOOSN5ryR8qh4_xmSvcauXIMLsexw5aeYAiQsg2_nmCaMsYQBXC6XbkoYKogRNuUNBAetG13aPC-e9jBGfHH3Pil-fLj8fvGpuvrycXFxflVZwXiqkCNvW4GKCKyVxY6KGjg2LbCO8Y6jUlb1PF9TMRRW9xpE3TaSqFaJhiE_KRY7387D0qyDW0HYGA_ObAs-DAZCcnZEA9C3kjEru4bUDZHAte6AQs25ajtOs9e7ndd6blfYWZxy1-Oe6f6fyV2bwd8YSlWtdS2yw5s7h-B_zhiTWblocRxhQj9Hw6lQQtSyIRl9_Re69HOY8qy2VF6jJuKBGiB34Kbe54Ptrak516zmtdK0ydTZP6j85N05mxPSu1zfE5zuCTKT8FcaYI7RLL593WdfPZ7K_Tj-BCYDbAfY4GMM2N8jlJjbVJpdKk1Opdmm0myyiO9EMcPTgOGh_f-ofgPxKOLx
Cites_doi	10.1021/jm500317a 10.1186/s13321-023-00769-x 10.1093/comjnl/45.6.631 10.1021/acs.jcim.4c00049 10.1039/D4MD00325J 10.1093/nar/gky1075 10.1016/j.ejpb.2006.06.005 10.1016/j.drudis.2009.01.012 10.1016/j.drudis.2016.03.015
ContentType	Journal Article
Copyright	The Author(s) 2025 2025. The Author(s). COPYRIGHT 2025 BioMed Central Ltd. Copyright Springer Nature B.V. Dec 2025 The Author(s) 2025 2025
Copyright_xml	– notice: The Author(s) 2025 – notice: 2025. The Author(s). – notice: COPYRIGHT 2025 BioMed Central Ltd. – notice: Copyright Springer Nature B.V. Dec 2025 – notice: The Author(s) 2025 2025
DBID	C6C AAYXX CITATION NPM ISR 3V. 7QO 7X7 7XB 8AO 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABJCF ABUWG AEUYN AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU D1I DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ K9. KB. LK8 M0S M7P P5Z P62 P64 PDBOC PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS 7X8 5PM DOA
DOI	10.1186/s13321-025-00956-y
DatabaseName	Springer Nature OA Free Journals CrossRef PubMed Gale In Context: Science ProQuest Central (Corporate) Biotechnology Research Abstracts Health & Medical Collection ProQuest Central (purchase pre-March 2016) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials Biological Science Collection ProQuest Central Technology Collection Natural Science Collection ProQuest One Community College ProQuest Materials Science Collection ProQuest Central Engineering Research Database ProQuest Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student ProQuest SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) Materials Science Database Biological Sciences ProQuest Health & Medical Collection Biological Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts Materials Science Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef PubMed Publicly Available Content Database ProQuest Central Student Technology Collection Technology Research Database ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials Materials Science Collection ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Biological Science Collection Materials Science Database ProQuest Central (New) ProQuest Materials Science Collection Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts Advanced Technologies & Aerospace Database ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Materials Science & Engineering Collection Engineering Research Database ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic
DatabaseTitleList	Publicly Available Content Database PubMed MEDLINE - Academic
Database_xml	– sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ (Directory of Open Access Journals) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Chemistry
EISSN	1758-2946
EndPage	8
ExternalDocumentID	oai_doaj_org_article_aafb622c6d904906a388da1a4337bd31 PMC11748845 A824347819 39833966 10_1186_s13321_025_00956_y
Genre	Journal Article
GrantInformation_xml	– fundername: Cátedra Villapharma-UCAM
GroupedDBID	-5F -5G -A0 -BR 0R~ 29K 2WC 3V. 4.4 40G 53G 5VS 7X7 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKKN AAKPC ABDBF ABEEZ ABJCF ABUWG ACACY ACGFS ACIHN ACIWK ACPRK ACUHS ACULB ADBBV ADINQ ADRAZ ADUKV AEAQA AENEX AEUYN AFGXO AFKRA AFRAH AHBYD AHMBA AHYZX ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C24 C6C CCPQU D-I D1I DIK E3Z EBLON EBS ESX F5P FRP FYUFA GROUPED_DOAJ GX1 HCIFZ HH5 HMCUK HYE IAO IGS IHR ISR ITC KB. KQ8 LK8 M48 M7P MK0 M~E O5R O5S OK1 P62 PDBOC PGMZT PIMPY PQQKQ PROAC RBZ RNS RPM RSV RVI SOJ SPH TR2 TUS U2A UKHRP AASML AAYXX AFPKN CITATION PHGZM PHGZT NPM PMFND 7QO 7XB 8FD 8FK AZQEC DWQXO FR3 GNUQQ K9. P64 PKEHL PQEST PQGLB PQUKI PRINS 7X8 5PM PUEGO
ID	FETCH-LOGICAL-c523t-e3e3bb5e705e47ced154a3e9ba2d23d3e77c7f300672e5c8f8a54b9607b7592e3
IEDL.DBID	M48
ISSN	1758-2946
IngestDate	Wed Aug 27 01:30:44 EDT 2025 Thu Aug 21 18:41:18 EDT 2025 Thu Jul 10 22:08:44 EDT 2025 Mon Jul 28 15:40:53 EDT 2025 Tue Jun 17 22:00:03 EDT 2025 Tue Jun 10 20:55:08 EDT 2025 Fri Jun 27 05:15:09 EDT 2025 Thu Jan 30 12:30:00 EST 2025 Tue Jul 01 03:49:27 EDT 2025 Fri Feb 21 02:37:29 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Data curation ChEMBL Matched structural pairs Assay noise Machine learning
Language	English
License	2025. The Author(s). Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c523t-e3e3bb5e705e47ced154a3e9ba2d23d3e77c7f300672e5c8f8a54b9607b7592e3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
OpenAccessLink	http://journals.scholarsportal.info/openUrl.xqy?doi=10.1186/s13321-025-00956-y
PMID	39833966
PQID	3157332805
PQPubID	54992
ParticipantIDs	doaj_primary_oai_doaj_org_article_aafb622c6d904906a388da1a4337bd31 pubmedcentral_primary_oai_pubmedcentral_nih_gov_11748845 proquest_miscellaneous_3157554690 proquest_journals_3157332805 gale_infotracmisc_A824347819 gale_infotracacademiconefile_A824347819 gale_incontextgauss_ISR_A824347819 pubmed_primary_39833966 crossref_primary_10_1186_s13321_025_00956_y springer_journals_10_1186_s13321_025_00956_y
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2025-01-20
PublicationDateYYYYMMDD	2025-01-20
PublicationDate_xml	– month: 01 year: 2025 text: 2025-01-20 day: 20
PublicationDecade	2020
PublicationPlace	Cham
PublicationPlace_xml	– name: Cham – name: England – name: London
PublicationTitle	Journal of cheminformatics
PublicationTitleAbbrev	J Cheminform
PublicationTitleAlternate	J Cheminform
PublicationYear	2025
Publisher	Springer International Publishing BioMed Central Ltd Springer Nature B.V BMC
Publisher_xml	– name: Springer International Publishing – name: BioMed Central Ltd – name: Springer Nature B.V – name: BMC
References	C Kramer (956_CR5) 2014; 57 GA Landrum (956_CR1) 2024; 64 D Mendez (956_CR2) 2019; 47 Z Fralish (956_CR7) 2023; 15 C Kramer (956_CR3) 2016; 21 Z Fralish (956_CR8) 2024 SP Brown (956_CR4) 2009; 14 956_CR6 956_CR9 JJ Palmgrén (956_CR11) 2006; 64 JW Raymond (956_CR10) 2002; 45
References_xml	– volume: 57 start-page: 3786 year: 2014 ident: 956_CR5 publication-title: J Med Chem doi: 10.1021/jm500317a – volume: 15 start-page: 101 year: 2023 ident: 956_CR7 publication-title: J Chemin doi: 10.1186/s13321-023-00769-x – volume: 45 start-page: 631 year: 2002 ident: 956_CR10 publication-title: Comput J doi: 10.1093/comjnl/45.6.631 – volume: 64 start-page: 1560 year: 2024 ident: 956_CR1 publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.4c00049 – ident: 956_CR6 – year: 2024 ident: 956_CR8 publication-title: RSC Med Chem doi: 10.1039/D4MD00325J – volume: 47 start-page: D930 year: 2019 ident: 956_CR2 publication-title: Nucleic Acids Res doi: 10.1093/nar/gky1075 – ident: 956_CR9 – volume: 64 start-page: 369 year: 2006 ident: 956_CR11 publication-title: Eur J Pharm Biopharm doi: 10.1016/j.ejpb.2006.06.005 – volume: 14 start-page: 420 year: 2009 ident: 956_CR4 publication-title: Drug Discov Today doi: 10.1016/j.drudis.2009.01.012 – volume: 21 start-page: 1213 year: 2016 ident: 956_CR3 publication-title: Drug Discov Today doi: 10.1016/j.drudis.2016.03.015
SSID	ssj0065707
Score	2.3453336
Snippet	Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful... Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without...
SourceID	doaj pubmedcentral proquest gale pubmed crossref springer
SourceType	Open Website Open Access Repository Aggregation Database Index Database Publisher
StartPage	8
SubjectTerms	Assay noise Assaying Brief Report ChEMBL Chemistry Chemistry and Materials Science Computational Biology/Bioinformatics Computer Applications in Chemistry Data curation Datasets Documentation and Information in Chemistry Error reduction Impact analysis Information management Machine learning Matched structural pairs Metadata Noise reduction Quality assessment Quality control Theoretical and Computational Chemistry
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nb9QwELVQL3BBlM9AQQEhcYCoie3EzrEUqoJUDkCl3qyxMyk9kK02u5X23zPjJEtThLgg5RRPlOSNP-bJM89CvC5ksE0FOrNKMUEpVAYGMWtQ27xuDQbgeueTL9Xxqf58Vp5dO-qLc8IGeeABuH2A1ldShqqpeZOqAmVtAwVopYxvYgW1pDVvIlPDHMz5HGYqkbHVfk9MTBJtlmWWR-W9zWwZimr9f87J1xalmwmTN3ZN42J0dE_cHaPI9GD4-l1xC7v74vbhdHjbA_HhBNgdTXrJ-zVpgz85DmRZiHS58Ot-xTNcCudwQbdT1oxYZhRGwya9IvI8aHdvHorTo4_fD4-z8cCELBCfXGWoUHlfoslL1CZgQ_ERKKw9yEaqRqExwbQqbr9iGWxrodSeOIzxpqwlqkdip1t0-ESkQEQGTFGXZKkJbmt8XmAVNNClERLxdsLPXQ66GC7yCVu5AW1HaLuIttsk4j1DvLVkTet4gzztRk-7f3k6Ea_YQY5VKzpOizmHdd-7T9--ugMrteKi2ToRb0ajdkGoBhirDOivWOhqZrk3syQHhXnz1A_cOKx7pwqWj5Q2LxPxctvMT3KqWoeL9WDDqX91nojHQ7fZ_reqaUgQwUyEnXWoGTDzlu7iRxT9Log6Wqvpxe-mvvf7u_6O_NP_gfwzcUfGsVPQpLondlbLNT6nWGzlX8Rh9wvrvS_d priority: 102 providerName: Directory of Open Access Journals – databaseName: Health & Medical Collection dbid: 7X7 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagHOCCeBMoKCAkDhA1sZ3YOaFSqApSOQCV9maNH7vtgWTZ7CLtv2fGSbakCKSc4okSj2fG_jLjz4y9KrjTvgKZaSEIoBQiAxVC5oPUeT1XwQHtdz79Up2cyc-zcjb8cOuGssoxJsZA7VtH_8gPREHMfVzn5bvlz4xOjaLs6nCExnV2g6jLqKRLzXaAi6o61LhRRlcHHeIxjuCZl1ke-fe2k8kocvb_HZn_mJqulk1eyZ3GKen4Drs9rCXTw37w77JrobnHbh6NR7jdZx9OgQbFp0vK2qQ-_KDVIJFDpKvWbro1xbkUFnCBt1NijlhluJiGbfoLIXTP4L19wM6OP34_OsmGYxMyh6hynQURhLVlUHkZpHLB4yoJRKgtcM-FF0Epp-YiJmFD6fRcQyktIhllVVnzIB6yvaZtwmOWAsIZUKhZlJRQgFY2L0LlJOAlAyTszag_s-zZMUxEFboyvbYNattEbZttwt6TineSxGwdb7SrhRkcxQDMbcW5q3xNSckKhNYeXy2FUNaLImEvaYAMcVc0VByzgE3XmU_fvppDzaWgrbN1wl4PQvMWtepg2GuAvSK6q4nk_kQSB8hNm0c7MINzd-bSFBP2YtdMT1LBWhPaTS9DBYB1nrBHvdns-i1qdAyEmQnTE4OaKGba0lycR-rvAgGk1hJf_Ha0vcvv-rfmn_y_G0_ZLR69osCguc_21qtNeIZrrbV9Hh3qNycKJ1w priority: 102 providerName: ProQuest – databaseName: SpringerLink Open Access Journals dbid: C24 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELagHOCCeJNSUEBIHCAi8SN2jmWhKkjlAFTqzRo7ztIDSbXZrbT_nhknWUiBA9Ke4omyGc-M59PMfGHsZcG9qUuQmRGCAEohMtAhZHWQJq8aHTzQvPPJ5_L4VH46U2fjUFg_dbtPJckYqaNbm_Jtj2iKI_TlKssje972OruhELuTXS9oxmGIv9TLoafxmL_eNzuCIlP_n_H4twPparPklYppPIiO7rDbYwaZHg5bfpddC-09dnMxfbjtPnt_ArQVdXpBtZq0Dj8oByRKiHTVuU2_puiWwhLO8XJKfBGrDFNo2KaXCJwH3u7tA3Z69OHb4jgbP5aQecSS6yyIIJxTQecqSO1DjbkRiFA54DUXtQhae92IWHoNypvGgJIO8Yt2WlU8iIdsr-3a8JilgCAGdFEplJRQgNEuL0LpJeBPBkjY60l_9mLgxLARS5jSDtq2qG0btW23CXtHKt5JEp91vNCtlnZ0DwvQuJJzX9YVlSJLEMbU-GgphHa1KBL2gjbIEmNFSy0xS9j0vf349Ys9NFwKGpitEvZqFGo61KqHccIA34pIrmaSBzNJ3CA_X57swI4u3VtREHUkN7lK2PPdMt1JbWpt6DaDDLX9VXnCHg1ms3tvUaE7ILhMmJkZ1Ewx85X2_Hsk_C4QNhoj8cFvJtv79b_-rfn9_xN_wm7x6CUFhs4DtrdebcJTzLjW7ll0sJ8IDSSg priority: 102 providerName: Springer Nature
Title	Matched pairs demonstrate robustness against inter-assay variability
URI	https://link.springer.com/article/10.1186/s13321-025-00956-y https://www.ncbi.nlm.nih.gov/pubmed/39833966 https://www.proquest.com/docview/3157332805 https://www.proquest.com/docview/3157554690 https://pubmed.ncbi.nlm.nih.gov/PMC11748845 https://doaj.org/article/aafb622c6d904906a388da1a4337bd31
Volume	17
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELf28QAviG8CowoIiQcIJLYTOw8IdWXdQOqEBpX6Zl0St0yCZKQtov89d05SyBgSUnVq40uTnO_s-8X2z4w9i3iuiwRkoIUggBKJAJS1QWGlDtO5sjnQeufJaXIylR9m8WyHddsdtQZcXgntaD-paf311c_vm7cY8G9cwOvk9RJxFkdQzOMgdLx6m122jz2TokCdyO2oAs3ycJutYI4c8FQm3SKaK_-j11E5Pv-_W-0_uq3LUyovjau67mp8k91o80x_2DjGLbZjy9vs2qjb3u0OezcBqrDCv6ARHb-w3yhTJOIIv66y9XJFbaAPCzjHwz6xStQBJtqw8X8gvG7YvTd32XR89Hl0ErRbKgQ5Is5VYIUVWRZbFcZWqtwWmEGBsGkGvOCiEFapXM2FG6C1ca7nGmKZIcpRmYpTbsU9tldWpX3AfECoAypKY9SUEIFWWRjZJJeAH2nBYy86-5mLhjnDOMShE9NY26C1jbO22XjskEy81STWa3egqhemDSIDMM8SzvOkSGnAMgGhdYGXlkKorBCRx55SBRnitShp4swC1sulef_pzAw1l4KW1aYee94qzSu0ag7tOgR8KqLC6mke9DSxgvJ-cecHpvNbIyIimOQ6jD32ZFtMZ9JkttJW60aHJgemocfuN26zfW6RYtAgBPWY7jlUzzD9kvL8i6MFjxBcai3xwi873_t9X_-2_MP_Mdojdp272IiwWT1ge6t6bR9jNrbKBmxXhsco1Uyh1GP8vn94dPrxDH-NuCSZjAbuXQfK41k0cEGJcsqHvwADhTds
linkProvider	Scholars Portal
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZKOZQL4k2gQEAgDhA1sZ3YOSBUWqpd2u0BWmlvrmN7lx5Ils0uaP8Uv5EZJ9mSIrhVyimeTeLxzHi-nYcJeZlQI22meSQZQ4CSsEgL5yLruIzziXBGY73z6DgbnPJP43S8QX51tTCYVtnZRG-obWXwP_IdlmDnPirj9P3se4SnRmF0tTtCoxGLQ7f6CZCtfjfch_V9RenBx5O9QdSeKhAZAF2LyDHHiiJ1Ik4dF8ZZcCI0c3mhqaXMMieEERPmY5QuNXIidcoLcPRFIdKcOgbPvUauw8Ybo0aJ8RrgYRaJ6ApzZLZTA_6jANZpGsW-39-qt_n5MwL-3gn-2Aovp2leitX6LfDgFrnZ-q7hbiNst8mGK--Qrb3uyLi7ZH-kUQhsOMMoUWjdN_Q-sRlFOK-KZb1AuxrqqT6H2yF2qphH4LzrVfgDIHvTMXx1j5xeCUPvk82yKt1DEmqAT1okeQqUXCdaiiJOXGa4hos7HZA3Hf_UrOnGoTyKkZlquK2A28pzW60C8gFZvKbETtr-RjWfqlYxldaTIqPUZDbHIGimmZQWXs0ZE4VlSUBe4AIp7JVRYjLOVC_rWg2_fFa7knKGpbp5QF63RJMKuGp0W9sAs8L2Wj3K7R4lLJDpD3dyoFpjUqsL0Q_I8_Uw_hIT5EpXLRsaTDjM44A8aMRmPW-WgyICrA2I7AlUjzH9kfL8q281ngBglZLDi992snfxXf_m_KP_T-MZ2RqcjI7U0fD48DG5Qb2GJGCwt8nmYr50T8DPWxRPvXKF5Oyqtfk3yzZkaw
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZKkYAL4k2gQEAgDhBtYjuxc0CodFl1Ka0QUGlvxrGdpQeSZbMLyl_j1zGTx5YUwa1STvFsEo9nxvPtPEzI04gaaRPNA8kYApSIBVo4F1jHZZjmwhmN9c6HR8n-MX83i2db5FdfC4Nplb1NbAy1LQ3-Rz5iEXbuozKMR3mXFvFhPHm9-B7gCVIYae2P02hF5MDVPwG-Va-mY1jrZ5RO3n7e2w-6EwYCAwBsFTjmWJbFToSx48I4Cw6FZi7NNLWUWeaEMCJnTbzSxUbmUsc8A6dfZCJOqWPw3AvkomBxhDomZhuwhxkloi_SkcmoAixIAbjTOAib3n_1YCNszgv4e1f4Y1s8m7J5Jm7bbIeTa-Rq58f6u63gXSdbrrhBLu_1x8fdJONDjQJh_QVGjHzrvqEnio0p_GWZrasV2lhfz_UJ3Paxa8UyAEde1_4PgO9t9_D6Fjk-F4beJttFWbi7xNcApbSI0hgouY60FFkYucRwDRd32iMvev6pRduZQzWIRiaq5bYCbquG26r2yBtk8YYSu2o3N8rlXHVKqrTOs4RSk9gUA6KJZlJaeDVnTGSWRR55gguksG9GgRI41-uqUtNPH9WupJxh2W7qkecdUV4CV43u6hxgVthqa0C5M6CEBTLD4V4OVGdYKnWqBh55vBnGX2KyXOHKdUuDyYdp6JE7rdhs5s1SUEqAuB6RA4EaMGY4Upx8bdqORwBepeTw4pe97J1-1785f-__03hELoEeq_fTo4P75AptFCQC271DtlfLtXsALt8qe9jolk--nLcy_wZ7x2ih
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Matched+pairs+demonstrate+robustness+against+inter-assay+variability&rft.jtitle=Journal+of+cheminformatics&rft.au=Nelen%2C+Jochem&rft.au=P%C3%A9rez-S%C3%A1nchez%2C+Horacio&rft.au=De+Winter%2C+Hans&rft.au=Van+Rompaey%2C+Dries&rft.date=2025-01-20&rft.pub=BioMed+Central+Ltd&rft.issn=1758-2946&rft.eissn=1758-2946&rft.volume=17&rft.issue=1&rft_id=info:doi/10.1186%2Fs13321-025-00956-y&rft.externalDBID=ISR&rft.externalDocID=A824347819
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1758-2946&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1758-2946&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1758-2946&client=summon