Model misspecification and robust analysis for outcome‐dependent sampling designs under generalized linear models

Outcome‐dependent sampling (ODS) is a commonly used class of sampling designs to increase estimation efficiency in settings where response information (and possibly adjuster covariates) is available, but the exposure is expensive and/or cumbersome to collect. We focus on ODS within the context of a...

Full description

Saved in:

Bibliographic Details
Published in	Statistics in medicine Vol. 42; no. 9; pp. 1338 - 1352
Main Authors	Maronge, Jacob M., Schildcrout, Jonathan S., Rathouz, Paul J.
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 30.04.2023 Wiley Subscription Services, Inc
Subjects	efficiency Generalized linear models Humans Likelihood Functions Linear Models Models, Statistical outcome‐dependent sampling Parametric statistics semi‐parametric models two‐phase studies generalized linear models efficiency outcome-dependent sampling two-phase studies semi-parametric models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Outcome‐dependent sampling (ODS) is a commonly used class of sampling designs to increase estimation efficiency in settings where response information (and possibly adjuster covariates) is available, but the exposure is expensive and/or cumbersome to collect. We focus on ODS within the context of a two‐phase study, where in Phase One the response and adjuster covariate information is collected on a large cohort that is representative of the target population, but the expensive exposure variable is not yet measured. In Phase Two, using response information from Phase One, we selectively oversample a subset of informative subjects in whom we collect expensive exposure information. Importantly, the Phase Two sample is no longer representative, and we must use ascertainment‐correcting analysis procedures for valid inferences. In this paper, we focus on likelihood‐based analysis procedures, particularly a conditional‐likelihood approach and a full‐likelihood approach. Whereas the full‐likelihood retains incomplete Phase One data for subjects not selected into Phase Two, the conditional‐likelihood explicitly conditions on Phase Two sample selection (ie, it is a “complete case” analysis procedure). These designs and analysis procedures are typically implemented assuming a known, parametric model for the response distribution. However, in this paper, we approach analyses implementing a novel semi‐parametric extension to generalized linear models (SPGLM) to develop likelihood‐based procedures with improved robustness to misspecification of distributional assumptions. We specifically focus on the common setting where standard GLM distributional assumptions are not satisfied (eg, misspecified mean/variance relationship). We aim to provide practical design guidance and flexible tools for practitioners in these settings.
Bibliography:	Funding information National Heart, Lung, and Blood Institute, Grant/Award Number: R01HL094786; University of Wisconsin Morse Society ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Present Address: 1400 Pressler St, Houston, TX, 77030
ISSN:	0277-6715 1097-0258 1097-0258
DOI:	10.1002/sim.9673