Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods

Logistic regression is a standard model in many studies of binary outcome data, and the analysis of missing data in this model is a fascinating topic. Based on the idea of Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat, 37:490–517, proposed are two...

Full description

Saved in:

Bibliographic Details
Published in	Computational statistics Vol. 38; no. 2; pp. 899 - 934
Main Authors	Lee, Shen-Ming, Le, Truong-Nhat, Tran, Phuoc-Loc, Li, Chin-Shang
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2023 Springer Nature B.V
Subjects	Data analysis Distribution functions Economic Theory/Quantitative Economics/Mathematical Methods Empirical equations Estimation Mathematics and Statistics Missing data Original Paper Probability and Statistics in Computer Science Probability Theory and Stochastic Processes Regression analysis Regression models Statistical analysis Statistics Inverse probability weighting Validation likelihood Joint conditional likelihood Multiple imputation Missing at random
Online Access	Get full text
ISSN	0943-4062 1613-9658
DOI	10.1007/s00180-022-01250-3

Cover

Loading…

More Information
Summary:	Logistic regression is a standard model in many studies of binary outcome data, and the analysis of missing data in this model is a fascinating topic. Based on the idea of Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat, 37:490–517, proposed are two different types of multiple imputation (MI) estimation methods, which each use three empirical conditional distribution functions to generate random values to impute missing data, to estimate the parameters of logistic regression with covariates missing at random (MAR) separately or simultaneously by using the estimating equations of Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc, 91:490–498. The derivation of the two proposed MI estimation methods is under the assumption of MAR separately or simultaneously and exclusively for categorical/discrete data. The two proposed methods are computationally effective, as evidenced by simulation studies. They have a quite similar efficiency and outperform the complete-case, semiparametric inverse probability weighting, validation likelihood, and random forest MI by chained equations methods. Although the two proposed methods are comparable with the joint conditional likelihood (JCL) method, they have more straightforward calculations and shorter computing times compared to the JCL and MICE methods. Two real data examples are used to illustrate the applicability of the proposed methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0943-4062 1613-9658
DOI:	10.1007/s00180-022-01250-3