Ultra‐high dimensional variable selection for doubly robust causal inference

Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra‐high dimensional data. In this paper, we p...

Full description

Saved in:
Bibliographic Details
Published inBiometrics Vol. 79; no. 2; pp. 903 - 914
Main Authors Tang, Dingke, Kong, Dehan, Pan, Wenliang, Wang, Linbo
Format Journal Article
LanguageEnglish
Published United States Blackwell Publishing Ltd 01.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra‐high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra‐high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model‐free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and pointwise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0006-341X
1541-0420
DOI:10.1111/biom.13625