Contemporary QSAR Classifiers Compared
We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rig...
Saved in:
Published in | Journal of chemical information and modeling Vol. 47; no. 1; pp. 219 - 227 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
American Chemical Society
01.01.2007
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models. |
---|---|
Bibliography: | ark:/67375/TPS-891C2MSR-6 istex:E2BBF3F14D93E0F0DA952A9336AA815440714410 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1549-9596 1549-960X |
DOI: | 10.1021/ci600332j |