Analyzing differences between microbiome communities using mixture distributions

In this paper, we present a method to assess differences between microbiome communities that effectively models sparse count data and accounts for presence‐absence bias frequently encountered when zeros are present. We assume that the observed data for each operational taxonomic unit is Poisson gene...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 37; no. 27; pp. 4036 - 4053
Main Authors Shestopaloff, Konstantin, Escobar, Michael D., Xu, Wei
Format Journal Article
LanguageEnglish
Published England Wiley Subscription Services, Inc 30.11.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we present a method to assess differences between microbiome communities that effectively models sparse count data and accounts for presence‐absence bias frequently encountered when zeros are present. We assume that the observed data for each operational taxonomic unit is Poisson generated with the rate for each sample originating from an underlying rate distribution. We propose to model this distribution using a mixture model, specifying the components based on the posterior rate distribution of a count and estimating the optimal weights using a least squares objective function. The distribution incorporates varying resolutions of samples, a point mass for differentiating structural and nonstructural zeros, and a truncation point mass to account for high values that are too sparse to model. As mixture component specification is not always straightforward, a method to estimate a joint model from several mixture distributions using minimum distances of bootstrap iterates is proposed. Once the population rate distribution is approximated, we obtain sample‐specific distributions by conditioning on the observed operational taxonomic unit count, resolution, and estimated mixture distribution and then use these to estimate pairwise distances for a permutation test. The method gives an accurate estimate of the true proportion of zeros for presence‐absence, effectively models the distribution of low counts using the mixture distribution, and achieves good power for detecting differences in a variety of scenarios. The method is tested using a simulation study and applied to two microbiome datasets. In each case, the results are compared with a number of existing methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0277-6715
1097-0258
DOI:10.1002/sim.7896