Prediction with high dimensional regression via hierarchically structured Gaussian mixtures and latent variables

We propose a hierarchical Gaussian locally linear mapping structured mixture model, named HGLLiM, to predict low dimensional responses based on high dimensional covariates when the associations between the responses and the covariates are non-linear. For tractability, HGLLiM adopts inverse regressio...

Full description

Saved in:
Bibliographic Details
Published inJournal of the Royal Statistical Society Series C: Applied Statistics Vol. 68; no. 5; pp. 1485 - 1507
Main Authors Tu, Chun-Chen, Forbes, Florence, Lemasson, Benjamin, Wang, Naisyin
Format Journal Article
LanguageEnglish
Published Oxford Wiley 01.11.2019
Oxford University Press
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose a hierarchical Gaussian locally linear mapping structured mixture model, named HGLLiM, to predict low dimensional responses based on high dimensional covariates when the associations between the responses and the covariates are non-linear. For tractability, HGLLiM adopts inverse regression to handle the high dimension and locally linear mappings to capture potentially non-linear relations. Data with similar associations are grouped together to form a cluster. A mixture is composed of several clusters following a hierarchical structure. This structure enables shared covariance matrices and latent factors across smaller clusters to limit the number of parameters to estimate. Moreover, HGLLiM adopts a robust estimation procedure for model stability. We use three real data sets to demonstrate different features of HGLLiM. With the face data set, HGLLiM shows ability to model non-linear relationships through mixtures. With the orange juice data set, we show that the prediction performance of HGLLiM is robust to the presence of outliers. Moreover, we demonstrate that HGLLiM is capable of handling large-scale complex data by using the data acquired from a magnetic resonance vascular fingerprinting study. These examples illustrate the wide applicability of HGLLiM to handle different aspects of a complex data structure in prediction.
ISSN:0035-9254
1467-9876
DOI:10.1111/rssc.12370