Variable selection and regression analysis for the prediction of mortality rates associated with foodborne diseases

The purpose of this study was to apply a novel statistical method for variable selection and a model-based approach for filling data gaps in mortality rates associated with foodborne diseases using the WHO Vital Registration mortality dataset. Correlation analysis and elastic net regularization meth...

Full description

Saved in:
Bibliographic Details
Published inEpidemiology and infection Vol. 144; no. 9; pp. 1959 - 1973
Main Authors AMENE, E., HANSON, L. A., ZAHN, E. A., WILD, S. R., DÖPFER, D.
Format Journal Article
LanguageEnglish
Published Cambridge, UK Cambridge University Press 01.07.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The purpose of this study was to apply a novel statistical method for variable selection and a model-based approach for filling data gaps in mortality rates associated with foodborne diseases using the WHO Vital Registration mortality dataset. Correlation analysis and elastic net regularization methods were applied to drop redundant variables and to select the most meaningful subset of predictors. Whenever predictor data were missing, multiple imputation was used to fill in plausible values. Cluster analysis was applied to identify similar groups of countries based on the values of the predictors. Finally, a Bayesian hierarchical regression model was fit to the final dataset for predicting mortality rates. From 113 potential predictors, 32 were retained after correlation analysis. Out of these 32 predictors, eight with non-zero coefficients were selected using the elastic net regularization method. Based on the values of these variables, four clusters of countries were identified. The uncertainty of predictions was large for countries within clusters lacking mortality rates, and it was low for a cluster that had mortality rate information. Our results demonstrated that, using Bayesian hierarchical regression models, a data-driven clustering of countries and a meaningful subset of predictors can be used to fill data gaps in foodborne disease mortality.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0950-2688
1469-4409
1469-4409
DOI:10.1017/S0950268815003234