Multinomial logistic regression with missing outcome data: An application to cancer subtypes

Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for so...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 39; no. 24; pp. 3299 - 3312
Main Authors Wang, Ching‐Yun, Hsu, Li
Format Journal Article
LanguageEnglish
Published England Wiley Subscription Services, Inc 30.10.2020
Subjects
Online AccessGet full text
ISSN0277-6715
1097-0258
1097-0258
DOI10.1002/sim.8666

Cover

More Information
Summary:Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for some subjects. In this article, we investigate methods for multinomial logistic regression with missing outcome data, including a bootstrap hot deck multiple imputation (BHMI), simple inverse probability weighted (SIPW), augmented inverse probability weighted (AIPW), and expected estimating equation (EEE) estimators. These methods are important approaches for missing data regression. The BHMI modifies the standard hot deck multiple imputation method such that it can provide valid confidence interval estimation. Under the situation when the covariates are discrete, the SIPW, AIPW, and EEE estimators are numerically identical. When the covariates are continuous, nonparametric smoothers can be applied to estimate the selection probabilities and the estimating scores. These methods perform similarly. Extensive simulations show that all of these methods yield unbiased estimators while the complete‐case (CC) analysis can be biased if the missingness depends on the observed data. Our simulations also demonstrate that these methods can gain substantial efficiency compared with the CC analysis. The methods are applied to a colorectal cancer study in which cancer subtype data are missing among some study individuals.
Bibliography:Funding information
US National Cancer Institute grants, CA189532, CA235122, CA86368, CA239168; National Heart, Lung, and Blood Institute grant, HL130483
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0277-6715
1097-0258
1097-0258
DOI:10.1002/sim.8666