Imputation for prediction: beware of diminishing returns
Missing values are prevalent across various fields, posing challenges for training and deploying predictive models. In this context, imputation is a common practice, driven by the hope that accurate imputations will enhance predictions. However, recent theoretical and empirical studies indicate that...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
29.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Missing values are prevalent across various fields, posing challenges for
training and deploying predictive models. In this context, imputation is a
common practice, driven by the hope that accurate imputations will enhance
predictions. However, recent theoretical and empirical studies indicate that
simple constant imputation can be consistent and competitive. This empirical
study aims at clarifying if and when investing in advanced imputation methods
yields significantly better predictions. Relating imputation and predictive
accuracies across combinations of imputation and predictive models on 20
datasets, we show that imputation accuracy matters less i) when using
expressive models, ii) when incorporating missingness indicators as
complementary inputs, iii) matters much more for generated linear outcomes than
for real-data outcomes. Interestingly, we also show that the use of the
missingness indicator is beneficial to the prediction performance, even in MCAR
scenarios. Overall, on real-data with powerful models, improving imputation
only has a minor effect on prediction performance. Thus, investing in better
imputations for improved predictions often offers limited benefits. |
---|---|
DOI: | 10.48550/arxiv.2407.19804 |