Analyzing differences between microbiome communities using mixture distributions

In this paper, we present a method to assess differences between microbiome communities that effectively models sparse count data and accounts for presence‐absence bias frequently encountered when zeros are present. We assume that the observed data for each operational taxonomic unit is Poisson gene...

Full description

Saved in:

Bibliographic Details
Published in	Statistics in medicine Vol. 37; no. 27; pp. 4036 - 4053
Main Authors	Shestopaloff, Konstantin, Escobar, Michael D., Xu, Wei
Format	Journal Article
Language	English
Published	England Wiley Subscription Services, Inc 30.11.2018
Subjects	Ecology microbiome Microorganisms mixture models sparsity statistical ecology Taxonomy zero inflation statistical ecology sparsity mixture models microbiome zero inflation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we present a method to assess differences between microbiome communities that effectively models sparse count data and accounts for presence‐absence bias frequently encountered when zeros are present. We assume that the observed data for each operational taxonomic unit is Poisson generated with the rate for each sample originating from an underlying rate distribution. We propose to model this distribution using a mixture model, specifying the components based on the posterior rate distribution of a count and estimating the optimal weights using a least squares objective function. The distribution incorporates varying resolutions of samples, a point mass for differentiating structural and nonstructural zeros, and a truncation point mass to account for high values that are too sparse to model. As mixture component specification is not always straightforward, a method to estimate a joint model from several mixture distributions using minimum distances of bootstrap iterates is proposed. Once the population rate distribution is approximated, we obtain sample‐specific distributions by conditioning on the observed operational taxonomic unit count, resolution, and estimated mixture distribution and then use these to estimate pairwise distances for a permutation test. The method gives an accurate estimate of the true proportion of zeros for presence‐absence, effectively models the distribution of low counts using the mixture distribution, and achieves good power for detecting differences in a variety of scenarios. The method is tested using a simulation study and applied to two microbiome datasets. In each case, the results are compared with a number of existing methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0277-6715 1097-0258
DOI:	10.1002/sim.7896