Efficient Substructure Searching of Large Chemical Libraries: The ABCD Chemical Cartridge

Efficient substructure searching is a key requirement for any chemical information management system. In this paper, we describe the substructure search capabilities of ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Developmen...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemical information and modeling Vol. 51; no. 12; pp. 3113 - 3130
Main Authors	Agrafiotis, Dimitris K, Lobanov, Victor S, Shemanarev, Maxim, Rassokhin, Dmitrii N, Izrailev, Sergei, Jaeger, Edward P, Alex, Simson, Farnum, Michael
Format	Journal Article
Language	English
Published	Washington, DC American Chemical Society 27.12.2011
Subjects	Algorithms Applied sciences Atoms & subatomic particles Biological and medical sciences Chemical Information Computer science; control theory; systems Databases, Factual Drug Discovery - economics Exact sciences and technology General pharmacology Informatics - economics Informatics - methods Information management Information retrieval. Graph Information systems. Data bases Medical sciences Memory organisation. Data processing Molecules Pharmaceutical technology. Pharmaceutical industry Pharmacology. Drug treatments R&D Research & development Small Molecule Libraries - chemistry Software Theoretical computing Time Factors Capability index Isomorphic graph Acceptance Database query Relational database Algorithmics Very large databases Experimental study SQL Screening File structure Pattern classification Management information systems Database Problem solving Cost estimation Pharmaceutical industry Oracle Chemical system User need Indexing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Efficient substructure searching is a key requirement for any chemical information management system. In this paper, we describe the substructure search capabilities of ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. The solution consists of several algorithmic components: 1) a pattern mapping algorithm for solving the subgraph isomorphism problem, 2) an indexing scheme that enables very fast substructure searches on large structure files, 3) the incorporation of that indexing scheme into an Oracle cartridge to enable querying large relational databases through SQL, and 4) a cost estimation scheme that allows the Oracle cost-based optimizer to generate a good execution plan when a substructure search is combined with additional constraints in a single SQL query. The algorithm was tested on a public database comprising nearly 1 million molecules using 4,629 substructure queries, the vast majority of which were submitted by discovery scientists over the last 2.5 years of user acceptance testing of ABCD. 80.7% of these queries were completed in less than a second and 96.8% in less than ten seconds on a single CPU, while on eight processing cores these numbers increased to 93.2% and 99.7%, respectively. The slower queries involved extremely generic patterns that returned the entire database as screening hits and required extensive atom-by-atom verification.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1549-9596 1549-960X
DOI:	10.1021/ci200413e