Large-Scale Annotation of Small-Molecule Libraries Using Public Databases

While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be wi...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 47; no. 4; pp. 1386 - 1394
Main Authors Zhou, Yingyao, Zhou, Bin, Chen, Kaisheng, Yan, S. Frank, King, Frederick J, Jiang, Shumei, Winzeler, Elizabeth A
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 01.07.2007
Subjects
Online AccessGet full text
ISSN1549-9596
1549-960X
DOI10.1021/ci700092v

Cover

Loading…
More Information
Summary:While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that ∼4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.
Bibliography:ark:/67375/TPS-L9GQM0N7-B
istex:B305B6E3025F2AB89A43E62262FA9BD3368D428F
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ObjectType-Undefined-3
ISSN:1549-9596
1549-960X
DOI:10.1021/ci700092v