Predictive analytics with gradient boosting in clinical medicine

Predictive analytics play an important role in clinical research. An accurate predictive model can help clinicians stratify risk thereby allowing the identification of a target population which might benefit from a certain intervention. Conventionally, predictive analytics is performed using paramet...

Full description

Saved in:

Bibliographic Details
Published in	Annals of translational medicine Vol. 7; no. 7; p. 152
Main Authors	Zhang, Zhongheng, Zhao, Yiming, Canes, Aran, Steinberg, Dan, Lyashevska, Olga
Format	Journal Article
Language	English
Published	China AME Publishing Company 01.04.2019
Subjects	Big-data Clinical Trial Column clinical medicine prediction decision tree Gradient boosting
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Predictive analytics play an important role in clinical research. An accurate predictive model can help clinicians stratify risk thereby allowing the identification of a target population which might benefit from a certain intervention. Conventionally, predictive analytics is performed using parametric modeling which comes with a number of assumptions. For example, generalized linear regression models require linearity and additivity to hold for the underlying data. However, these assumptions may not hold in practice. Especially in the era of big data, a large number of covariates or features can be extracted from an electronic database which might have complex interactions and higher-order terms among the covariates. Conventional modeling methods have trouble capturing such high-dimensional relationships. However, some sophisticated machine learning techniques have been invented to handle this situation. Gradient boosting is one of these techniques which is able to recursively fit a weak learner to the residual so as to improve model performance with a gradually increasing number of iterations. It can automatically discover complex data structure, including nonlinearity and high-order interactions, even in the context of hundreds, thousands, or tens-of-thousands of potential predictors. This paper aims to introduce how gradient boosting works. The principles behind this learning machine are explained with a small example in a step-by-step manner. The formal implementation of gradient tree boosting is then illustrated with the package. In the simulated example complexity of data structure is created by generating certain interactions between the covariates. This example shows that gradient boosting can better capture these complex relationships than a generalized linear model-based approach.
Bibliography:	SourceType-Other Sources-1 content type line 63 ObjectType-Editorial-2 ObjectType-Commentary-1 These authors contributed equally to this work.
ISSN:	2305-5839 2305-5839
DOI:	10.21037/atm.2019.03.29