Bi-level multi-source learning for heterogeneous block-wise missing data

Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to impro...

Full description

Saved in:

Bibliographic Details
Published in	NeuroImage (Orlando, Fla.) Vol. 102; pp. 192 - 206
Main Authors	Xiang, Shuo, Yuan, Lei, Fan, Wei, Wang, Yalin, Thompson, Paul M., Ye, Jieping
Format	Journal Article
Language	English
Published	United States Elsevier Inc 15.11.2014 Elsevier Limited
Subjects	Accuracy Algorithms Alzheimer Disease - cerebrospinal fluid Alzheimer Disease - diagnosis Alzheimer's disease Biomedical research Block-wise missing data Classification Data Mining Humans Magnetic Resonance Imaging Medical imaging Multi-modal fusion Multi-source Neuroimaging - statistics & numerical data NMR Nuclear magnetic resonance Optimization Positron-Emission Tomography Protein expression Proteomics Multi-source Multi-modal fusion Alzheimer's disease Block-wise missing data Optimization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Bio-imaging technologies allow scientists to collect large amounts of high-dimensional data from multiple heterogeneous sources for many biomedical applications. In the study of Alzheimer's Disease (AD), neuroimaging data, gene/protein expression data, etc., are often analyzed together to improve predictive power. Joint learning from multiple complementary data sources is advantageous, but feature-pruning and data source selection are critical to learn interpretable models from high-dimensional data. Often, the data collected has block-wise missing entries. In the Alzheimer's Disease Neuroimaging Initiative (ADNI), most subjects have MRI and genetic information, but only half have cerebrospinal fluid (CSF) measures, a different half has FDG-PET; only some have proteomic data. Here we propose how to effectively integrate information from multiple heterogeneous data sources when data is block-wise missing. We present a unified “bi-level” learning model for complete multi-source data, and extend it to incomplete data. Our major contributions are: (1) our proposed models unify feature-level and source-level analysis, including several existing feature learning approaches as special cases; (2) the model for incomplete data avoids imputing missing data and offers superior performance; it generalizes to other applications with block-wise missing data sources; (3) we present efficient optimization algorithms for modeling complete and incomplete data. We comprehensively evaluate the proposed models including all ADNI subjects with at least one of four data types at baseline: MRI, FDG-PET, CSF and proteomics. Our proposed models compare favorably with existing approaches. •Ability to fuse large multi-modal datasets with large segments of missing entries.•A unified framework to perform both feature-level and source-level analysis.•Efficient optimization algorithms for both models with complete and incomplete data.•Detailed evaluation and comparison on clinical group classification problems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 ObjectType-Review-3 Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but most of them did not participate in analysis or writing of this report. A complete listing of ADNI investigators may be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
ISSN:	1053-8119 1095-9572 1095-9572
DOI:	10.1016/j.neuroimage.2013.08.015