Matched pairs demonstrate robustness against inter-assay variability
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between comp...
Saved in:
Published in | Journal of cheminformatics Vol. 17; no. 1; p. 8 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
20.01.2025
BioMed Central Ltd Springer Nature B.V BMC |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K
i
and IC
50
values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. |
---|---|
AbstractList | Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K
i
and IC
50
values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K.sub.i and IC.sub.50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K and IC values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for K.sub.i and IC.sub.50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. Keywords: Matched structural pairs, Assay noise, Data curation, ChEMBL, Machine learning Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for Ki and IC50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44-46% for Ki and IC50 values respectively, which improved to 66-79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6-8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment. |
ArticleNumber | 8 |
Audience | Academic |
Author | De Winter, Hans Pérez-Sánchez, Horacio Van Rompaey, Dries Nelen, Jochem |
Author_xml | – sequence: 1 givenname: Jochem surname: Nelen fullname: Nelen, Jochem organization: Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), HiTech Innovation Hub, UCAM Universidad Católica de Murcia, Health Sciences PhD Program, Universidad Católica de Murcia UCAM – sequence: 2 givenname: Horacio surname: Pérez-Sánchez fullname: Pérez-Sánchez, Horacio organization: Structural Bioinformatics and High Performance Computing Research Group (BIO-HPC), HiTech Innovation Hub, UCAM Universidad Católica de Murcia – sequence: 3 givenname: Hans surname: De Winter fullname: De Winter, Hans email: hans.dewinter@uantwerpen.be organization: Department of Pharmaceutical Sciences, Faculty of Pharmaceutical, Biomedical and Veterinary Sciences, University of Antwerp – sequence: 4 givenname: Dries surname: Van Rompaey fullname: Van Rompaey, Dries organization: Drug Discovery Data Sciences, Janssen Pharmaceutica NV |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39833966$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kltv1DAQhSNURC_wB3hAkXihDym-xJc8oaoUWKkIicuzNXEmqVfZeLGdiv33uLul7SKE8pBo8p1jz8w5Lg4mP2FRvKTkjFIt30bKOaMVYaIipBGy2jwpjqgSumJNLQ8efR8WxzEuCZFCEfWsOOSN5ryR8qh4_xmSvcauXIMLsexw5aeYAiQsg2_nmCaMsYQBXC6XbkoYKogRNuUNBAetG13aPC-e9jBGfHH3Pil-fLj8fvGpuvrycXFxflVZwXiqkCNvW4GKCKyVxY6KGjg2LbCO8Y6jUlb1PF9TMRRW9xpE3TaSqFaJhiE_KRY7387D0qyDW0HYGA_ObAs-DAZCcnZEA9C3kjEru4bUDZHAte6AQs25ajtOs9e7ndd6blfYWZxy1-Oe6f6fyV2bwd8YSlWtdS2yw5s7h-B_zhiTWblocRxhQj9Hw6lQQtSyIRl9_Re69HOY8qy2VF6jJuKBGiB34Kbe54Ptrak516zmtdK0ydTZP6j85N05mxPSu1zfE5zuCTKT8FcaYI7RLL593WdfPZ7K_Tj-BCYDbAfY4GMM2N8jlJjbVJpdKk1Opdmm0myyiO9EMcPTgOGh_f-ofgPxKOLx |
Cites_doi | 10.1021/jm500317a 10.1186/s13321-023-00769-x 10.1093/comjnl/45.6.631 10.1021/acs.jcim.4c00049 10.1039/D4MD00325J 10.1093/nar/gky1075 10.1016/j.ejpb.2006.06.005 10.1016/j.drudis.2009.01.012 10.1016/j.drudis.2016.03.015 |
ContentType | Journal Article |
Copyright | The Author(s) 2025 2025. The Author(s). COPYRIGHT 2025 BioMed Central Ltd. Copyright Springer Nature B.V. Dec 2025 The Author(s) 2025 2025 |
Copyright_xml | – notice: The Author(s) 2025 – notice: 2025. The Author(s). – notice: COPYRIGHT 2025 BioMed Central Ltd. – notice: Copyright Springer Nature B.V. Dec 2025 – notice: The Author(s) 2025 2025 |
DBID | C6C AAYXX CITATION NPM ISR 3V. 7QO 7X7 7XB 8AO 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABJCF ABUWG AEUYN AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI CCPQU D1I DWQXO FR3 FYUFA GHDGH GNUQQ HCIFZ K9. KB. LK8 M0S M7P P5Z P62 P64 PDBOC PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS 7X8 5PM DOA |
DOI | 10.1186/s13321-025-00956-y |
DatabaseName | Springer Nature OA Free Journals CrossRef PubMed Gale In Context: Science ProQuest Central (Corporate) Biotechnology Research Abstracts Health & Medical Collection ProQuest Central (purchase pre-March 2016) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials Biological Science Collection ProQuest Central Technology Collection Natural Science Collection ProQuest One Community College ProQuest Materials Science Collection ProQuest Central Engineering Research Database ProQuest Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student ProQuest SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) Materials Science Database Biological Sciences ProQuest Health & Medical Collection Biological Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts Materials Science Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef PubMed Publicly Available Content Database ProQuest Central Student Technology Collection Technology Research Database ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials Materials Science Collection ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Biological Science Collection Materials Science Database ProQuest Central (New) ProQuest Materials Science Collection Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts Advanced Technologies & Aerospace Database ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Materials Science & Engineering Collection Engineering Research Database ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | Publicly Available Content Database PubMed MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ (Directory of Open Access Journals) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Chemistry |
EISSN | 1758-2946 |
EndPage | 8 |
ExternalDocumentID | oai_doaj_org_article_aafb622c6d904906a388da1a4337bd31 PMC11748845 A824347819 39833966 10_1186_s13321_025_00956_y |
Genre | Journal Article |
GrantInformation_xml | – fundername: Cátedra Villapharma-UCAM |
GroupedDBID | -5F -5G -A0 -BR 0R~ 29K 2WC 3V. 4.4 40G 53G 5VS 7X7 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKKN AAKPC ABDBF ABEEZ ABJCF ABUWG ACACY ACGFS ACIHN ACIWK ACPRK ACUHS ACULB ADBBV ADINQ ADRAZ ADUKV AEAQA AENEX AEUYN AFGXO AFKRA AFRAH AHBYD AHMBA AHYZX ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C24 C6C CCPQU D-I D1I DIK E3Z EBLON EBS ESX F5P FRP FYUFA GROUPED_DOAJ GX1 HCIFZ HH5 HMCUK HYE IAO IGS IHR ISR ITC KB. KQ8 LK8 M48 M7P MK0 M~E O5R O5S OK1 P62 PDBOC PGMZT PIMPY PQQKQ PROAC RBZ RNS RPM RSV RVI SOJ SPH TR2 TUS U2A UKHRP AASML AAYXX AFPKN CITATION PHGZM PHGZT NPM PMFND 7QO 7XB 8FD 8FK AZQEC DWQXO FR3 GNUQQ K9. P64 PKEHL PQEST PQGLB PQUKI PRINS 7X8 5PM PUEGO |
ID | FETCH-LOGICAL-c523t-e3e3bb5e705e47ced154a3e9ba2d23d3e77c7f300672e5c8f8a54b9607b7592e3 |
IEDL.DBID | M48 |
ISSN | 1758-2946 |
IngestDate | Wed Aug 27 01:30:44 EDT 2025 Thu Aug 21 18:41:18 EDT 2025 Thu Jul 10 22:08:44 EDT 2025 Mon Jul 28 15:40:53 EDT 2025 Tue Jun 17 22:00:03 EDT 2025 Tue Jun 10 20:55:08 EDT 2025 Fri Jun 27 05:15:09 EDT 2025 Thu Jan 30 12:30:00 EST 2025 Tue Jul 01 03:49:27 EDT 2025 Fri Feb 21 02:37:29 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Data curation ChEMBL Matched structural pairs Assay noise Machine learning |
Language | English |
License | 2025. The Author(s). Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c523t-e3e3bb5e705e47ced154a3e9ba2d23d3e77c7f300672e5c8f8a54b9607b7592e3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.1186/s13321-025-00956-y |
PMID | 39833966 |
PQID | 3157332805 |
PQPubID | 54992 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_aafb622c6d904906a388da1a4337bd31 pubmedcentral_primary_oai_pubmedcentral_nih_gov_11748845 proquest_miscellaneous_3157554690 proquest_journals_3157332805 gale_infotracmisc_A824347819 gale_infotracacademiconefile_A824347819 gale_incontextgauss_ISR_A824347819 pubmed_primary_39833966 crossref_primary_10_1186_s13321_025_00956_y springer_journals_10_1186_s13321_025_00956_y |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2025-01-20 |
PublicationDateYYYYMMDD | 2025-01-20 |
PublicationDate_xml | – month: 01 year: 2025 text: 2025-01-20 day: 20 |
PublicationDecade | 2020 |
PublicationPlace | Cham |
PublicationPlace_xml | – name: Cham – name: England – name: London |
PublicationTitle | Journal of cheminformatics |
PublicationTitleAbbrev | J Cheminform |
PublicationTitleAlternate | J Cheminform |
PublicationYear | 2025 |
Publisher | Springer International Publishing BioMed Central Ltd Springer Nature B.V BMC |
Publisher_xml | – name: Springer International Publishing – name: BioMed Central Ltd – name: Springer Nature B.V – name: BMC |
References | C Kramer (956_CR5) 2014; 57 GA Landrum (956_CR1) 2024; 64 D Mendez (956_CR2) 2019; 47 Z Fralish (956_CR7) 2023; 15 C Kramer (956_CR3) 2016; 21 Z Fralish (956_CR8) 2024 SP Brown (956_CR4) 2009; 14 956_CR6 956_CR9 JJ Palmgrén (956_CR11) 2006; 64 JW Raymond (956_CR10) 2002; 45 |
References_xml | – volume: 57 start-page: 3786 year: 2014 ident: 956_CR5 publication-title: J Med Chem doi: 10.1021/jm500317a – volume: 15 start-page: 101 year: 2023 ident: 956_CR7 publication-title: J Chemin doi: 10.1186/s13321-023-00769-x – volume: 45 start-page: 631 year: 2002 ident: 956_CR10 publication-title: Comput J doi: 10.1093/comjnl/45.6.631 – volume: 64 start-page: 1560 year: 2024 ident: 956_CR1 publication-title: J Chem Inf Model doi: 10.1021/acs.jcim.4c00049 – ident: 956_CR6 – year: 2024 ident: 956_CR8 publication-title: RSC Med Chem doi: 10.1039/D4MD00325J – volume: 47 start-page: D930 year: 2019 ident: 956_CR2 publication-title: Nucleic Acids Res doi: 10.1093/nar/gky1075 – ident: 956_CR9 – volume: 64 start-page: 369 year: 2006 ident: 956_CR11 publication-title: Eur J Pharm Biopharm doi: 10.1016/j.ejpb.2006.06.005 – volume: 14 start-page: 420 year: 2009 ident: 956_CR4 publication-title: Drug Discov Today doi: 10.1016/j.drudis.2009.01.012 – volume: 21 start-page: 1213 year: 2016 ident: 956_CR3 publication-title: Drug Discov Today doi: 10.1016/j.drudis.2016.03.015 |
SSID | ssj0065707 |
Score | 2.3453336 |
Snippet | Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful... Abstract Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without... |
SourceID | doaj pubmedcentral proquest gale pubmed crossref springer |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 8 |
SubjectTerms | Assay noise Assaying Brief Report ChEMBL Chemistry Chemistry and Materials Science Computational Biology/Bioinformatics Computer Applications in Chemistry Data curation Datasets Documentation and Information in Chemistry Error reduction Impact analysis Information management Machine learning Matched structural pairs Metadata Noise reduction Quality assessment Quality control Theoretical and Computational Chemistry |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nb9QwELVQL3BBlM9AQQEhcYCoie3EzrEUqoJUDkCl3qyxMyk9kK02u5X23zPjJEtThLgg5RRPlOSNP-bJM89CvC5ksE0FOrNKMUEpVAYGMWtQ27xuDQbgeueTL9Xxqf58Vp5dO-qLc8IGeeABuH2A1ldShqqpeZOqAmVtAwVopYxvYgW1pDVvIlPDHMz5HGYqkbHVfk9MTBJtlmWWR-W9zWwZimr9f87J1xalmwmTN3ZN42J0dE_cHaPI9GD4-l1xC7v74vbhdHjbA_HhBNgdTXrJ-zVpgz85DmRZiHS58Ot-xTNcCudwQbdT1oxYZhRGwya9IvI8aHdvHorTo4_fD4-z8cCELBCfXGWoUHlfoslL1CZgQ_ERKKw9yEaqRqExwbQqbr9iGWxrodSeOIzxpqwlqkdip1t0-ESkQEQGTFGXZKkJbmt8XmAVNNClERLxdsLPXQ66GC7yCVu5AW1HaLuIttsk4j1DvLVkTet4gzztRk-7f3k6Ea_YQY5VKzpOizmHdd-7T9--ugMrteKi2ToRb0ajdkGoBhirDOivWOhqZrk3syQHhXnz1A_cOKx7pwqWj5Q2LxPxctvMT3KqWoeL9WDDqX91nojHQ7fZ_reqaUgQwUyEnXWoGTDzlu7iRxT9Log6Wqvpxe-mvvf7u_6O_NP_gfwzcUfGsVPQpLondlbLNT6nWGzlX8Rh9wvrvS_d priority: 102 providerName: Directory of Open Access Journals – databaseName: Health & Medical Collection dbid: 7X7 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagHOCCeBMoKCAkDhA1sZ3YOaFSqApSOQCV9maNH7vtgWTZ7CLtv2fGSbakCKSc4okSj2fG_jLjz4y9KrjTvgKZaSEIoBQiAxVC5oPUeT1XwQHtdz79Up2cyc-zcjb8cOuGssoxJsZA7VtH_8gPREHMfVzn5bvlz4xOjaLs6nCExnV2g6jLqKRLzXaAi6o61LhRRlcHHeIxjuCZl1ke-fe2k8kocvb_HZn_mJqulk1eyZ3GKen4Drs9rCXTw37w77JrobnHbh6NR7jdZx9OgQbFp0vK2qQ-_KDVIJFDpKvWbro1xbkUFnCBt1NijlhluJiGbfoLIXTP4L19wM6OP34_OsmGYxMyh6hynQURhLVlUHkZpHLB4yoJRKgtcM-FF0Epp-YiJmFD6fRcQyktIhllVVnzIB6yvaZtwmOWAsIZUKhZlJRQgFY2L0LlJOAlAyTszag_s-zZMUxEFboyvbYNattEbZttwt6TineSxGwdb7SrhRkcxQDMbcW5q3xNSckKhNYeXy2FUNaLImEvaYAMcVc0VByzgE3XmU_fvppDzaWgrbN1wl4PQvMWtepg2GuAvSK6q4nk_kQSB8hNm0c7MINzd-bSFBP2YtdMT1LBWhPaTS9DBYB1nrBHvdns-i1qdAyEmQnTE4OaKGba0lycR-rvAgGk1hJf_Ha0vcvv-rfmn_y_G0_ZLR69osCguc_21qtNeIZrrbV9Hh3qNycKJ1w priority: 102 providerName: ProQuest – databaseName: SpringerLink Open Access Journals dbid: C24 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELagHOCCeJNSUEBIHCAi8SN2jmWhKkjlAFTqzRo7ztIDSbXZrbT_nhknWUiBA9Ke4omyGc-M59PMfGHsZcG9qUuQmRGCAEohMtAhZHWQJq8aHTzQvPPJ5_L4VH46U2fjUFg_dbtPJckYqaNbm_Jtj2iKI_TlKssje972OruhELuTXS9oxmGIv9TLoafxmL_eNzuCIlP_n_H4twPparPklYppPIiO7rDbYwaZHg5bfpddC-09dnMxfbjtPnt_ArQVdXpBtZq0Dj8oByRKiHTVuU2_puiWwhLO8XJKfBGrDFNo2KaXCJwH3u7tA3Z69OHb4jgbP5aQecSS6yyIIJxTQecqSO1DjbkRiFA54DUXtQhae92IWHoNypvGgJIO8Yt2WlU8iIdsr-3a8JilgCAGdFEplJRQgNEuL0LpJeBPBkjY60l_9mLgxLARS5jSDtq2qG0btW23CXtHKt5JEp91vNCtlnZ0DwvQuJJzX9YVlSJLEMbU-GgphHa1KBL2gjbIEmNFSy0xS9j0vf349Ys9NFwKGpitEvZqFGo61KqHccIA34pIrmaSBzNJ3CA_X57swI4u3VtREHUkN7lK2PPdMt1JbWpt6DaDDLX9VXnCHg1ms3tvUaE7ILhMmJkZ1Ewx85X2_Hsk_C4QNhoj8cFvJtv79b_-rfn9_xN_wm7x6CUFhs4DtrdebcJTzLjW7ll0sJ8IDSSg priority: 102 providerName: Springer Nature |
Title | Matched pairs demonstrate robustness against inter-assay variability |
URI | https://link.springer.com/article/10.1186/s13321-025-00956-y https://www.ncbi.nlm.nih.gov/pubmed/39833966 https://www.proquest.com/docview/3157332805 https://www.proquest.com/docview/3157554690 https://pubmed.ncbi.nlm.nih.gov/PMC11748845 https://doaj.org/article/aafb622c6d904906a388da1a4337bd31 |
Volume | 17 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELf28QAviG8CowoIiQcIJLYTOw8IdWXdQOqEBpX6Zl0St0yCZKQtov89d05SyBgSUnVq40uTnO_s-8X2z4w9i3iuiwRkoIUggBKJAJS1QWGlDtO5sjnQeufJaXIylR9m8WyHddsdtQZcXgntaD-paf311c_vm7cY8G9cwOvk9RJxFkdQzOMgdLx6m122jz2TokCdyO2oAs3ycJutYI4c8FQm3SKaK_-j11E5Pv-_W-0_uq3LUyovjau67mp8k91o80x_2DjGLbZjy9vs2qjb3u0OezcBqrDCv6ARHb-w3yhTJOIIv66y9XJFbaAPCzjHwz6xStQBJtqw8X8gvG7YvTd32XR89Hl0ErRbKgQ5Is5VYIUVWRZbFcZWqtwWmEGBsGkGvOCiEFapXM2FG6C1ca7nGmKZIcpRmYpTbsU9tldWpX3AfECoAypKY9SUEIFWWRjZJJeAH2nBYy86-5mLhjnDOMShE9NY26C1jbO22XjskEy81STWa3egqhemDSIDMM8SzvOkSGnAMgGhdYGXlkKorBCRx55SBRnitShp4swC1sulef_pzAw1l4KW1aYee94qzSu0ag7tOgR8KqLC6mke9DSxgvJ-cecHpvNbIyIimOQ6jD32ZFtMZ9JkttJW60aHJgemocfuN26zfW6RYtAgBPWY7jlUzzD9kvL8i6MFjxBcai3xwi873_t9X_-2_MP_Mdojdp272IiwWT1ge6t6bR9jNrbKBmxXhsco1Uyh1GP8vn94dPrxDH-NuCSZjAbuXQfK41k0cEGJcsqHvwADhTds |
linkProvider | Scholars Portal |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZKOZQL4k2gQEAgDhA1sZ3YOSBUWqpd2u0BWmlvrmN7lx5Ils0uaP8Uv5EZJ9mSIrhVyimeTeLxzHi-nYcJeZlQI22meSQZQ4CSsEgL5yLruIzziXBGY73z6DgbnPJP43S8QX51tTCYVtnZRG-obWXwP_IdlmDnPirj9P3se4SnRmF0tTtCoxGLQ7f6CZCtfjfch_V9RenBx5O9QdSeKhAZAF2LyDHHiiJ1Ik4dF8ZZcCI0c3mhqaXMMieEERPmY5QuNXIidcoLcPRFIdKcOgbPvUauw8Ybo0aJ8RrgYRaJ6ApzZLZTA_6jANZpGsW-39-qt_n5MwL-3gn-2Aovp2leitX6LfDgFrnZ-q7hbiNst8mGK--Qrb3uyLi7ZH-kUQhsOMMoUWjdN_Q-sRlFOK-KZb1AuxrqqT6H2yF2qphH4LzrVfgDIHvTMXx1j5xeCUPvk82yKt1DEmqAT1okeQqUXCdaiiJOXGa4hos7HZA3Hf_UrOnGoTyKkZlquK2A28pzW60C8gFZvKbETtr-RjWfqlYxldaTIqPUZDbHIGimmZQWXs0ZE4VlSUBe4AIp7JVRYjLOVC_rWg2_fFa7knKGpbp5QF63RJMKuGp0W9sAs8L2Wj3K7R4lLJDpD3dyoFpjUqsL0Q_I8_Uw_hIT5EpXLRsaTDjM44A8aMRmPW-WgyICrA2I7AlUjzH9kfL8q281ngBglZLDi992snfxXf_m_KP_T-MZ2RqcjI7U0fD48DG5Qb2GJGCwt8nmYr50T8DPWxRPvXKF5Oyqtfk3yzZkaw |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZKkYAL4k2gQEAgDhBtYjuxc0CodFl1Ka0QUGlvxrGdpQeSZbMLyl_j1zGTx5YUwa1STvFsEo9nxvPtPEzI04gaaRPNA8kYApSIBVo4F1jHZZjmwhmN9c6HR8n-MX83i2db5FdfC4Nplb1NbAy1LQ3-Rz5iEXbuozKMR3mXFvFhPHm9-B7gCVIYae2P02hF5MDVPwG-Va-mY1jrZ5RO3n7e2w-6EwYCAwBsFTjmWJbFToSx48I4Cw6FZi7NNLWUWeaEMCJnTbzSxUbmUsc8A6dfZCJOqWPw3AvkomBxhDomZhuwhxkloi_SkcmoAixIAbjTOAib3n_1YCNszgv4e1f4Y1s8m7J5Jm7bbIeTa-Rq58f6u63gXSdbrrhBLu_1x8fdJONDjQJh_QVGjHzrvqEnio0p_GWZrasV2lhfz_UJ3Paxa8UyAEde1_4PgO9t9_D6Fjk-F4beJttFWbi7xNcApbSI0hgouY60FFkYucRwDRd32iMvev6pRduZQzWIRiaq5bYCbquG26r2yBtk8YYSu2o3N8rlXHVKqrTOs4RSk9gUA6KJZlJaeDVnTGSWRR55gguksG9GgRI41-uqUtNPH9WupJxh2W7qkecdUV4CV43u6hxgVthqa0C5M6CEBTLD4V4OVGdYKnWqBh55vBnGX2KyXOHKdUuDyYdp6JE7rdhs5s1SUEqAuB6RA4EaMGY4Upx8bdqORwBepeTw4pe97J1-1785f-__03hELoEeq_fTo4P75AptFCQC271DtlfLtXsALt8qe9jolk--nLcy_wZ7x2ih |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Matched+pairs+demonstrate+robustness+against+inter-assay+variability&rft.jtitle=Journal+of+cheminformatics&rft.au=Nelen%2C+Jochem&rft.au=P%C3%A9rez-S%C3%A1nchez%2C+Horacio&rft.au=De+Winter%2C+Hans&rft.au=Van+Rompaey%2C+Dries&rft.date=2025-01-20&rft.pub=BioMed+Central+Ltd&rft.issn=1758-2946&rft.eissn=1758-2946&rft.volume=17&rft.issue=1&rft_id=info:doi/10.1186%2Fs13321-025-00956-y&rft.externalDBID=ISR&rft.externalDocID=A824347819 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1758-2946&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1758-2946&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1758-2946&client=summon |