Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement

Developing realistic data sets for evaluating virtual screening methods is a task that has been tackled by the cheminformatics community for many years. Numerous artificially constructed data collections were developed, such as DUD, DUD-E, or DEKOIS. However, they all suffer from multiple drawbacks,...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of molecular sciences Vol. 21; no. 12; p. 4380
Main Authors Tran-Nguyen, Viet-Khoa, Rognan, Didier
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 19.06.2020
MDPI
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Developing realistic data sets for evaluating virtual screening methods is a task that has been tackled by the cheminformatics community for many years. Numerous artificially constructed data collections were developed, such as DUD, DUD-E, or DEKOIS. However, they all suffer from multiple drawbacks, one of which is the absence of experimental results confirming the impotence of presumably inactive molecules, leading to possible false negatives in the ligand sets. In light of this problem, the PubChem BioAssay database, an open-access repository providing the bioactivity information of compounds that were already tested on a biological target, is now a recommended source for data set construction. Nevertheless, there exist several issues with the use of such data that need to be properly addressed. In this article, an overview of benchmarking data collections built upon experimental PubChem BioAssay input is provided, along with a thorough discussion of noteworthy issues that one must consider during the design of new ligand sets from this database. The points raised in this review are expected to guide future developments in this regard, in hopes of offering better evaluation tools for novel in silico screening procedures.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-3
content type line 23
ObjectType-Review-1
PMCID: PMC7352161
ISSN:1422-0067
1661-6596
1422-0067
DOI:10.3390/ijms21124380