The impact of imputation methods on the performance of Phase I Hotelling's T 2 control chart

The objective of this study was to evaluate the impact of three different methods of handling missing data on the performance of Phase I Hotelling's T 2 multivariate control chart. Using a Monte Carlo simulation, we studied the average, median, and standard deviation of the run length performan...

Full description

Saved in:
Bibliographic Details
Published inCommunications in statistics. Simulation and computation Vol. 54; no. 6; pp. 2076 - 2088
Main Authors Wilson, Carla, Cohen, Achraf
Format Journal Article
LanguageEnglish
Published Taylor & Francis 03.06.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The objective of this study was to evaluate the impact of three different methods of handling missing data on the performance of Phase I Hotelling's T 2 multivariate control chart. Using a Monte Carlo simulation, we studied the average, median, and standard deviation of the run length performance of multivariate data imputed using mean substitution, regression imputation, and predictive mean matching at three different levels of missingness ( 1 % , 10 % , and 25 % ) and three levels of variable correlation coefficients (0.2, 0.4, and 0.8). We found that predictive mean matching has average run length performance results comparable to that of the complete in-control data set at all levels of missingness and variable correlation, while the performance of mean substitution was adversely affected by high levels of missingness and by strong variable correlation. Based on the simulation (multivariate normal data), we concluded that predictive mean matching is superior to both regression imputation and mean substitution as a method for imputing missing values for the analysis of Phase I Hotelling's T 2 control chart. Two applications were presented using the Altenrhein wastewater treatment plant and Olive oil datasets.
ISSN:0361-0918
1532-4141
DOI:10.1080/03610918.2024.2310689