Model misspecification and robust analysis for outcome‐dependent sampling designs under generalized linear models

Outcome‐dependent sampling (ODS) is a commonly used class of sampling designs to increase estimation efficiency in settings where response information (and possibly adjuster covariates) is available, but the exposure is expensive and/or cumbersome to collect. We focus on ODS within the context of a...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 42; no. 9; pp. 1338 - 1352
Main Authors Maronge, Jacob M., Schildcrout, Jonathan S., Rathouz, Paul J.
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 30.04.2023
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Outcome‐dependent sampling (ODS) is a commonly used class of sampling designs to increase estimation efficiency in settings where response information (and possibly adjuster covariates) is available, but the exposure is expensive and/or cumbersome to collect. We focus on ODS within the context of a two‐phase study, where in Phase One the response and adjuster covariate information is collected on a large cohort that is representative of the target population, but the expensive exposure variable is not yet measured. In Phase Two, using response information from Phase One, we selectively oversample a subset of informative subjects in whom we collect expensive exposure information. Importantly, the Phase Two sample is no longer representative, and we must use ascertainment‐correcting analysis procedures for valid inferences. In this paper, we focus on likelihood‐based analysis procedures, particularly a conditional‐likelihood approach and a full‐likelihood approach. Whereas the full‐likelihood retains incomplete Phase One data for subjects not selected into Phase Two, the conditional‐likelihood explicitly conditions on Phase Two sample selection (ie, it is a “complete case” analysis procedure). These designs and analysis procedures are typically implemented assuming a known, parametric model for the response distribution. However, in this paper, we approach analyses implementing a novel semi‐parametric extension to generalized linear models (SPGLM) to develop likelihood‐based procedures with improved robustness to misspecification of distributional assumptions. We specifically focus on the common setting where standard GLM distributional assumptions are not satisfied (eg, misspecified mean/variance relationship). We aim to provide practical design guidance and flexible tools for practitioners in these settings.
Bibliography:Funding information
National Heart, Lung, and Blood Institute, Grant/Award Number: R01HL094786; University of Wisconsin Morse Society
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Present Address: 1400 Pressler St, Houston, TX, 77030
ISSN:0277-6715
1097-0258
1097-0258
DOI:10.1002/sim.9673