A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets

Datasets with missing data ratios ranging from 24% to 4%, corresponding to three air quality monitoring studies, were used to ascertain whether major differences occur when five currently used imputation methods are applied (four single imputation methods and a multiple imputation one). Unrotated an...

Full description

Saved in:
Bibliographic Details
Published inChemometrics and intelligent laboratory systems Vol. 134; pp. 23 - 33
Main Authors Gómez-Carracedo, M.P., Andrade, J.M., López-Mahía, P., Muniategui, S., Prada, D.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 15.05.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Datasets with missing data ratios ranging from 24% to 4%, corresponding to three air quality monitoring studies, were used to ascertain whether major differences occur when five currently used imputation methods are applied (four single imputation methods and a multiple imputation one). Unrotated and Varimax-rotated factor analyses performed on the imputed datasets were compared. All methods performed similarly, although multiple imputation yielded more disperse imputed values. Main differences occurred when a variable with missing values correlated poorly to the other features and when a variable had relevant loadings in several unrotated factors, which sometimes changed the order of the rotated factors. •Five imputation methods were tested on three real datasets with missing data.•Varimax rotation was used to compare results from a practical viewpoint.•Multiple imputation yielded more scattered values under certain circumstances.•Rotated factors change their order when variables influence several components.•Expectation–maximization and iterative use of scores and loadings performed best.
ISSN:0169-7439
1873-3239
DOI:10.1016/j.chemolab.2014.02.007