A comparison of various software tools for dealing with missing data via imputation

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the...

Full description

Saved in:

Bibliographic Details
Published in	Journal of statistical computation and simulation Vol. 81; no. 11; pp. 1653 - 1675
Main Authors	Cortiñas Abrahantes, José, Sotto, Cristina, Molenberghs, Geert, Vromman, Geert, Bierinckx, Bart
Format	Journal Article
Language	English
Published	Abingdon Taylor & Francis 01.11.2011 Taylor & Francis Ltd
Subjects	Comparative analysis Computer programs Computer simulation Markov analysis missing at random Missing data missing not at random Monte Carlo methods Monte Carlo simulation multiple imputation Packages Parameter estimation random forest Rendering Routines Software Statistical inference Statistical methods
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual - an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0094-9655 1563-5163
DOI:	10.1080/00949655.2010.498788