Prediction with high dimensional regression via hierarchically structured Gaussian mixtures and latent variables

We propose a hierarchical Gaussian locally linear mapping structured mixture model, named HGLLiM, to predict low dimensional responses based on high dimensional covariates when the associations between the responses and the covariates are non-linear. For tractability, HGLLiM adopts inverse regressio...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the Royal Statistical Society Series C: Applied Statistics Vol. 68; no. 5; pp. 1485 - 1507
Main Authors	Tu, Chun-Chen, Forbes, Florence, Lemasson, Benjamin, Wang, Naisyin
Format	Journal Article
Language	English
Published	Oxford Wiley 01.11.2019 Oxford University Press
Subjects	Clusters Covariance matrix Data acquisition Data structures Datasets Expectation–maximization Fingerprinting High dimension Magnetic resonance Magnetic resonance vascular fingerprinting Mapping Mathematics Matrices Mixture of regressions Mixtures Oranges Outliers (statistics) Parameter estimation Probabilistic models Robustness Statistics Structural hierarchy High dimension Expectation-maximization Robustness Mixture of regressions Magnetic resonance vascular fingerprinting
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a hierarchical Gaussian locally linear mapping structured mixture model, named HGLLiM, to predict low dimensional responses based on high dimensional covariates when the associations between the responses and the covariates are non-linear. For tractability, HGLLiM adopts inverse regression to handle the high dimension and locally linear mappings to capture potentially non-linear relations. Data with similar associations are grouped together to form a cluster. A mixture is composed of several clusters following a hierarchical structure. This structure enables shared covariance matrices and latent factors across smaller clusters to limit the number of parameters to estimate. Moreover, HGLLiM adopts a robust estimation procedure for model stability. We use three real data sets to demonstrate different features of HGLLiM. With the face data set, HGLLiM shows ability to model non-linear relationships through mixtures. With the orange juice data set, we show that the prediction performance of HGLLiM is robust to the presence of outliers. Moreover, we demonstrate that HGLLiM is capable of handling large-scale complex data by using the data acquired from a magnetic resonance vascular fingerprinting study. These examples illustrate the wide applicability of HGLLiM to handle different aspects of a complex data structure in prediction.
ISSN:	0035-9254 1467-9876
DOI:	10.1111/rssc.12370