Contemporary QSAR Classifiers Compared

We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rig...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 47; no. 1; pp. 219 - 227
Main Authors Bruce, Craig L, Melville, James L, Pickett, Stephen D, Hirst, Jonathan D
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 01.01.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.
Bibliography:ark:/67375/TPS-891C2MSR-6
istex:E2BBF3F14D93E0F0DA952A9336AA815440714410
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1549-9596
1549-960X
DOI:10.1021/ci600332j