Development of Binary Classification of Structural Chromosome Aberrations for a Diverse Set of Organic Compounds from Molecular Structure

Classification models are generated to predict in vitro cytogenetic results for a diverse set of 383 organic compounds. Both k-nearest neighbor and support vector machine models are developed. They are based on calculated molecular structure descriptors. Endpoints used are the labels clastogenic or...

Full description

Saved in:
Bibliographic Details
Published inChemical research in toxicology Vol. 16; no. 2; pp. 153 - 163
Main Authors Serra, J. R, Thompson, E. D, Jurs, P. C
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 01.02.2003
Subjects
Online AccessGet full text
ISSN0893-228X
1520-5010
DOI10.1021/tx020077w

Cover

Loading…
More Information
Summary:Classification models are generated to predict in vitro cytogenetic results for a diverse set of 383 organic compounds. Both k-nearest neighbor and support vector machine models are developed. They are based on calculated molecular structure descriptors. Endpoints used are the labels clastogenic or nonclastogenic according to an in vitro chromosomal aberration assay with Chinese hamster lung cells. Compounds that were tested with both a 24 and 48 h exposure are included. Each compound is represented by calculated molecular structure descriptors encoding the topological, electronic, geometrical, or polar surface area aspects of the structure. Subsets of informative descriptors are identified with genetic algorithm feature selection coupled to the appropriate classification algorithm. The overall classification success rate for a k-nearest neighbor classifier built with just six topological descriptors is 81.2% for the training set and 86.5% for an external prediction set. The overall classification success rate for a three-descriptor support vector machine model is 99.7% for the training set, 92.1% for the cross-validation set, and 83.8% for an external prediction set.
Bibliography:istex:FC807E2BD789BA5B51DB8F56980299E5736760AF
ark:/67375/TPS-CX961TD2-J
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0893-228X
1520-5010
DOI:10.1021/tx020077w