Multinomial logistic regression with missing outcome data: An application to cancer subtypes

Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for so...

Full description

Saved in:

Bibliographic Details
Published in	Statistics in medicine Vol. 39; no. 24; pp. 3299 - 3312
Main Authors	Wang, Ching‐Yun, Hsu, Li
Format	Journal Article
Language	English
Published	England Wiley Subscription Services, Inc 30.10.2020
Subjects	Cancer Colorectal cancer hot deck multiple imputation inverse probability weighting missing at random missing at random hot deck multiple imputation inverse probability weighting
Online Access	Get full text
ISSN	0277-6715 1097-0258 1097-0258
DOI	10.1002/sim.8666

Cover

More Information
Summary:	Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for some subjects. In this article, we investigate methods for multinomial logistic regression with missing outcome data, including a bootstrap hot deck multiple imputation (BHMI), simple inverse probability weighted (SIPW), augmented inverse probability weighted (AIPW), and expected estimating equation (EEE) estimators. These methods are important approaches for missing data regression. The BHMI modifies the standard hot deck multiple imputation method such that it can provide valid confidence interval estimation. Under the situation when the covariates are discrete, the SIPW, AIPW, and EEE estimators are numerically identical. When the covariates are continuous, nonparametric smoothers can be applied to estimate the selection probabilities and the estimating scores. These methods perform similarly. Extensive simulations show that all of these methods yield unbiased estimators while the complete‐case (CC) analysis can be biased if the missingness depends on the observed data. Our simulations also demonstrate that these methods can gain substantial efficiency compared with the CC analysis. The methods are applied to a colorectal cancer study in which cancer subtype data are missing among some study individuals.
Bibliography:	Funding information US National Cancer Institute grants, CA189532, CA235122, CA86368, CA239168; National Heart, Lung, and Blood Institute grant, HL130483 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0277-6715 1097-0258 1097-0258
DOI:	10.1002/sim.8666