Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
Background: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction perf...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.07.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Background: Choosing the most performing method in terms of outcome
prediction or variables selection is a recurring problem in prognosis studies,
leading to many publications on methods comparison. But some aspects have
received little attention. First, most comparison studies treat prediction
performance and variable selection aspects separately. Second, methods are
either compared within a binary outcome setting (based on an arbitrarily chosen
delay) or within a survival setting, but not both. In this paper, we propose a
comparison methodology to weight up those different settings both in terms of
prediction and variables selection, while incorporating advanced machine
learning strategies. Methods: Using a high-dimensional case study on a
sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the
binary outcome setting, we consider logistic regression (LR), support vector
machine (SVM), random forest (RF), gradient boosting (GB) and neural network
(NN); while on the survival analysis setting, we consider the Cox Proportional
Hazards (PH), the CURE and the C-mix models. We then compare performances of
all methods both in terms of risk prediction and variable selection, with a
focus on the use of Elastic-Net regularization technique. Results: Among all
assessed statistical methods assessed, the C-mix model yields the better
performances in both the two considered settings, as well as interesting
interpretation aspects. There is some consistency in selected covariates across
methods within a setting, but not much across the two settings. Conclusions: It
appears that learning withing the survival setting first, and then going back
to a binary prediction using the survival estimates significantly enhance
binary predictions. |
---|---|
DOI: | 10.48550/arxiv.1807.09821 |