Oops, we missed a spot: Comparing data substitution methods for non-random missing survey data in a longitudinal study
Imputation methods for missing data may not always be applicable, namely, when the data were completely missing for the whole sample. To estimate the missing data, we compared three missing item substitution methods: (1) mean substitution; (2) last observation carried forward (LOCF); and (3) regress...
Saved in:
Published in | Journal of affective disorders Vol. 370; pp. 434 - 438 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier B.V
01.02.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 0165-0327 1573-2517 1573-2517 |
DOI | 10.1016/j.jad.2024.10.070 |
Cover
Loading…
Summary: | Imputation methods for missing data may not always be applicable, namely, when the data were completely missing for the whole sample. To estimate the missing data, we compared three missing item substitution methods: (1) mean substitution; (2) last observation carried forward (LOCF); and (3) regression-predicted values. A total of 384 parents reported their 8- to 18-year-old children's anxiety level using the 9-item Screen for Child Anxiety Related Disorders at baseline (Time 1) and two later time points, drawing from a larger longitudinal study (Ontario COVID-19 and Kids' Mental Health Study). We predicted a survey item measured one month after baseline (Time 2) using: (1) the mean value of the rest of the test items; (2) the value of the same item measured at baseline; and (3) the predicted value from the linear regression with all other test items as predictors. Within-Subjects ANOVA results showed a main effect of substitution methods on total score at Time 2. Post-hoc analysis indicated that mean substitution was significantly different from the actual data. Regression-predicted values overestimated the median compared to the actual values, while the LOCF estimation produced comparable means and identical medians. Similar results were found while using other indicators and extending the analysis to a larger 4-month time interval (Time 3), suggesting LOCF is more accurate and reliable than mean substitution or regression-prediction. This study proposes when advanced substitution methods are not applicable, a systematic comparison of alternative methods may help researchers to arrive at a more informed decision in data processing.
•Imputation methods like multiple imputations (MI) are typically preferred to handle missing data.•Empirical comparisons can provide reliable estimates when these MI methods are not applicable.•LOCF yielded better mean, median scores, and smaller errors for SCARED scores vs. mean substitution and regression.•This approach may guide researchers facing similar challenges in missing time-sensitive mental health measures. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0165-0327 1573-2517 1573-2517 |
DOI: | 10.1016/j.jad.2024.10.070 |