Graph-Guided Bayesian Factor Model for Integrative Analysis of Multi-modal Data with Noisy Network Information

There is a growing body of literature on factor analysis that can capture individual and shared structures in multi-modal data. However, few of these approaches incorporate biological knowledge such as functional genomics and functional metabolomics. Graph-guided statistical learning methods that ca...

Full description

Saved in:
Bibliographic Details
Published inStatistics in biosciences
Main Authors Li, Wenrui, Zhang, Qiyiwen, Qu, Kewen, Long, Qi
Format Journal Article
LanguageEnglish
Published United States 11.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:There is a growing body of literature on factor analysis that can capture individual and shared structures in multi-modal data. However, few of these approaches incorporate biological knowledge such as functional genomics and functional metabolomics. Graph-guided statistical learning methods that can incorporate knowledge of underlying networks have been shown to improve predication and classification accuracy, and yield more interpretable results. Moreover, these methods typically use graphs extracted from existing databases or rely on subject matter expertise which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian factor model that can account for network noise and identify globally shared, partially shared and modality-specific latent factors in multimodal data. Specifically, we use two sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed features in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian factor analysis model with shrinkage priors to encourage feature-wise and modal-wise sparsity, thereby allowing feature selection and identification of factors of each type. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of gene expression and metabolomics datasets for Alzheimer's disease.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1867-1764
1867-1772
DOI:10.1007/s12561-024-09452-7