Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes

Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying intera...

Full description

Saved in:

Bibliographic Details
Published in	Journal of clinical and translational science Vol. 5; no. 1; p. e59
Main Authors	Wolf, Bethany J., Jiang, Yunyun, Wilson, Sylvia H., Oates, Jim C.
Format	Journal Article
Language	English
Published	England Cambridge University Press 01.01.2021
Subjects	Algorithms Bias boosting Business metrics Estimates Feature selection Hypothermia interactions Lupus nephritis Methods Nephritis Patients penalized regression Performance evaluation Research Methods and Technology Surgery two-stage algorithm Variable selection Variables boosting two-stage algorithm penalized regression Variable selection interactions
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized . We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance for choosing between the methods. We conducted simulations comparing stepwise selection, penalized GLMM, boosted GLMM, and boosted GEE for variable selection considering main effects and two-way interactions in data with repeatedly measured binary outcomes and evaluate a two-stage approach to reduce bias and error in parameter estimates. We compared these approaches in real data applications: hypothermia during surgery and treatment response in lupus nephritis. Penalized and boosted approaches recovered correct predictors and interactions more frequently than stepwise selection. Penalized GLMM recovered correct predictors more often than boosting, but included many spurious predictors. Boosted GLMM yielded parsimonious models and identified correct predictors well at large sample and effect sizes, but required excessive computation time. Boosted GEE was computationally efficient and selected relatively parsimonious models, offering a compromise between computation and parsimony. The two-stage approach reduced the bias and error in regression parameters in all approaches. Penalized and boosted approaches are effective for variable selection in data with clustered binary outcomes. The two-stage approach reduces bias and error and should be applied regardless of method. We provide guidance for choosing the most appropriate method in real applications.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 These authors are joint first authors.
ISSN:	2059-8661 2059-8661
DOI:	10.1017/cts.2020.556