Evaluation of predictive model performance of an existing model in the presence of missing data

In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of a...

Full description

Saved in:

Bibliographic Details
Published in	Statistics in medicine Vol. 40; no. 15; pp. 3477 - 3498
Main Authors	Li, Pin, Taylor, Jeremy M. G., Spratt, Daniel E., Karnes, R. Jeffery, Schipper, Matthew J.
Format	Journal Article
Language	English
Published	England Wiley Subscription Services, Inc 10.07.2021
Subjects	area under the ROC curve augmented inverse probability weighting Brier score Computer Simulation Data Interpretation, Statistical Humans inverse probability weighting Male Missing data multiple imputation Probability Prostate cancer ROC Curve multiple imputation augmented inverse probability weighting inverse probability weighting area under the ROC curve Brier score
Online Access	Get full text
ISSN	0277-6715 1097-0258 1097-0258
DOI	10.1002/sim.8978

Cover

Loading…

More Information
Summary:	In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss‐specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.
Bibliography:	Funding information U.S. National Institutes of Health, CA059827; CA129102 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0277-6715 1097-0258 1097-0258
DOI:	10.1002/sim.8978