Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data

Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 10; no. 7; p. e0129606
Main Authors	Xu, Lizhen, Paterson, Andrew D., Turpin, Williams, Xu, Wei
Format	Journal Article
Language	English
Published	United States Public Library of Science 06.07.2015 Public Library of Science (PLoS)
Subjects	Acquired immune deficiency syndrome AIDS Analysis Bias Bioinformatics Computer simulation Data processing Error detection Gastrointestinal Microbiome - physiology Gastrointestinal Tract - microbiology Generalized linear models Goodness of fit Hospitals Humans Inflation (Economics) Intestinal microflora Mathematical models Microbiomes Microbiota Models, Statistical Parameter estimation Parameters Public health Regression Analysis Statistical analysis Studies Taxonomy
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Conceived and designed the experiments: LX ADP WT WX. Performed the experiments: LX ADP WT WX. Analyzed the data: LX WT. Contributed reagents/materials/analysis tools: LX ADP WT WX. Wrote the paper: LX ADP WT WX. Competing Interests: Williams Turpin acknowledges a CAG/CIHR Ferring Pharmaceuticals Inc. award. This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0129606