Multiple kernel learning with random effects for predicting longitudinal outcomes and data integration

Predicting disease risk and progression is one of the main goals in many clinical research studies. Cohort studies on the natural history and etiology of chronic diseases span years and data are collected at multiple visits. Although, kernel‐based statistical learning methods are proven to be powerf...

Full description

Saved in:
Bibliographic Details
Published inBiometrics Vol. 71; no. 4; pp. 918 - 928
Main Authors Chen, Tianle, Zeng, Donglin, Wang, Yuanjia
Format Journal Article
LanguageEnglish
Published United States International Biometric Society, etc. 01.12.2015
Blackwell Publishing Ltd
International Biometric Society
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Predicting disease risk and progression is one of the main goals in many clinical research studies. Cohort studies on the natural history and etiology of chronic diseases span years and data are collected at multiple visits. Although, kernel‐based statistical learning methods are proven to be powerful for a wide range of disease prediction problems, these methods are only well studied for independent data, but not for longitudinal data. It is thus important to develop time‐sensitive prediction rules that make use of the longitudinal nature of the data. In this paper, we develop a novel statistical learning method for longitudinal data by introducing subject‐specific short‐term and long‐term latent effects through a designed kernel to account for within‐subject correlation of longitudinal measurements. Since the presence of multiple sources of data is increasingly common, we embed our method in a multiple kernel learning framework and propose a regularized multiple kernel statistical learning with random effects to construct effective nonparametric prediction rules. Our method allows easy integration of various heterogeneous data sources and takes advantage of correlation among longitudinal measures to increase prediction power. We use different kernels for each data source taking advantage of the distinctive feature of each data modality, and then optimally combine data across modalities. We apply the developed methods to two large epidemiological studies, one on Huntington's disease and the other on Alzheimer's Disease (Alzheimer's Disease Neuroimaging Initiative, ADNI) where we explore a unique opportunity to combine imaging and genetic data to study prediction of mild cognitive impairment, and show a substantial gain in performance while accounting for the longitudinal aspect of the data.
Bibliography:http://dx.doi.org/10.1111/biom.12343
ArticleID:BIOM12343
ark:/67375/WNG-5TC7SNRX-C
istex:2C3257F1FC74D8457BB19A4994600A901497F6DE
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0006-341X
1541-0420
1541-0420
DOI:10.1111/biom.12343