Contemporary QSAR Classifiers Compared

We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rig...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemical information and modeling Vol. 47; no. 1; pp. 219 - 227
Main Authors	Bruce, Craig L, Melville, James L, Pickett, Stephen D, Hirst, Jonathan D
Format	Journal Article
Language	English
Published	United States American Chemical Society 01.01.2007
Subjects	Algorithms Artificial Intelligence Classification Comparative analysis Drugs Models, Statistical Quantitative Structure-Activity Relationship
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.
Bibliography:	ark:/67375/TPS-891C2MSR-6 istex:E2BBF3F14D93E0F0DA952A9336AA815440714410 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1549-9596 1549-960X
DOI:	10.1021/ci600332j