New Publicly Available Chemical Query Language, CSRML, To Support Chemotype Representations for Application to Data Mining and Modeling

Chemotypes are a new approach for representing molecules, chemical substructures and patterns, reaction rules, and reactions. Chemotypes are capable of integrating types of information beyond what is possible using current representation methods (e.g., SMARTS patterns) or reaction transformations (e...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 55; no. 3; pp. 510 - 528
Main Authors Yang, Chihae, Tarkhov, Aleksey, Marusczyk, Jörg, Bienfait, Bruno, Gasteiger, Johann, Kleinoeder, Thomas, Magdziarz, Tomasz, Sacher, Oliver, Schwab, Christof H, Schwoebel, Johannes, Terfloth, Lothar, Arvidson, Kirk, Richard, Ann, Worth, Andrew, Rathman, James
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 23.03.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Chemotypes are a new approach for representing molecules, chemical substructures and patterns, reaction rules, and reactions. Chemotypes are capable of integrating types of information beyond what is possible using current representation methods (e.g., SMARTS patterns) or reaction transformations (e.g., SMIRKS, reaction SMILES). Chemotypes are expressed in the XML-based Chemical Subgraphs and Reactions Markup Language (CSRML), and can be encoded not only with connectivity and topology but also with properties of atoms, bonds, electronic systems, or molecules. CSRML has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory, and commercial-use chemical space, as well as to represent chemical patterns and properties especially relevant to various toxicity concerns. A software application, ChemoTyper has also been developed and made publicly available in order to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML-based CSRML standard used to express chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1549-9596
1549-960X
1549-960X
DOI:10.1021/ci500667v