Oops, we missed a spot: Comparing data substitution methods for non-random missing survey data in a longitudinal study

Imputation methods for missing data may not always be applicable, namely, when the data were completely missing for the whole sample. To estimate the missing data, we compared three missing item substitution methods: (1) mean substitution; (2) last observation carried forward (LOCF); and (3) regress...

Full description

Saved in:

Bibliographic Details
Published in	Journal of affective disorders Vol. 370; pp. 434 - 438
Main Authors	Cheung, Theodore C.K., Cost, Katherine T., Esser, Kayla, Anagnostou, Evdokia, Birken, Catherine S., Charach, Alice, Monga, Suneeta, Korczak, Daphne J., Crosbie, Jennifer
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 01.02.2025
Subjects	Adolescent Adolescents Anxiety Disorders - diagnosis Child Children COVID-19 Data Interpretation, Statistical Data substitution Female Humans Last observation carried forward Longitudinal Studies Male Missing data Psychiatric/Mental Health SARS-CoV-2 Surveys and Questionnaires Missing data Last observation carried forward Data substitution Children Adolescents
Online Access	Get full text
ISSN	0165-0327 1573-2517 1573-2517
DOI	10.1016/j.jad.2024.10.070

Cover

Loading…

More Information
Summary:	Imputation methods for missing data may not always be applicable, namely, when the data were completely missing for the whole sample. To estimate the missing data, we compared three missing item substitution methods: (1) mean substitution; (2) last observation carried forward (LOCF); and (3) regression-predicted values. A total of 384 parents reported their 8- to 18-year-old children's anxiety level using the 9-item Screen for Child Anxiety Related Disorders at baseline (Time 1) and two later time points, drawing from a larger longitudinal study (Ontario COVID-19 and Kids' Mental Health Study). We predicted a survey item measured one month after baseline (Time 2) using: (1) the mean value of the rest of the test items; (2) the value of the same item measured at baseline; and (3) the predicted value from the linear regression with all other test items as predictors. Within-Subjects ANOVA results showed a main effect of substitution methods on total score at Time 2. Post-hoc analysis indicated that mean substitution was significantly different from the actual data. Regression-predicted values overestimated the median compared to the actual values, while the LOCF estimation produced comparable means and identical medians. Similar results were found while using other indicators and extending the analysis to a larger 4-month time interval (Time 3), suggesting LOCF is more accurate and reliable than mean substitution or regression-prediction. This study proposes when advanced substitution methods are not applicable, a systematic comparison of alternative methods may help researchers to arrive at a more informed decision in data processing. •Imputation methods like multiple imputations (MI) are typically preferred to handle missing data.•Empirical comparisons can provide reliable estimates when these MI methods are not applicable.•LOCF yielded better mean, median scores, and smaller errors for SCARED scores vs. mean substitution and regression.•This approach may guide researchers facing similar challenges in missing time-sensitive mental health measures.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0165-0327 1573-2517 1573-2517
DOI:	10.1016/j.jad.2024.10.070