Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes

Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single...

Full description

Saved in:
Bibliographic Details
Published inBiology (Basel, Switzerland) Vol. 11; no. 3; p. 360
Main Authors Zhou, Kaiyue, Kottoori, Bhagya Shree, Munj, Seeya Awadhut, Zhang, Zhewei, Draghici, Sorin, Arslanturk, Suzan
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 24.02.2022
MDPI
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2079-7737
2079-7737
DOI:10.3390/biology11030360