KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA

The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis,...

Full description

Saved in:
Bibliographic Details
Published inThe annals of applied statistics Vol. 12; no. 1; p. 540
Main Authors Randolph, Timothy W, Zhao, Sen, Copeland, Wade, Hullar, Meredith, Shojaie, Ali
Format Journal Article
LanguageEnglish
Published United States 01.03.2018
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.
ISSN:1932-6157
DOI:10.1214/17-AOAS1102