Variable Selection and Estimation for Misclassified Binary Responses and Multivariate Error-Prone Predictors

In statistical analysis or supervised learning, classification has been an attractive topic. Typically, a main goal is to adopt predictors to characterize the primarily interested binary random variables. To model a binary response and predictors, parametric structures, such as logistic regression m...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computational and graphical statistics Vol. 33; no. 2; pp. 407 - 420
Main Author	Chen, Li-Pang
Format	Journal Article
Language	English
Published	Alexandria Taylor & Francis 02.04.2024 Taylor & Francis Ltd
Subjects	Algorithms Boosting Data analysis Data collection Error analysis Error correction Error elimination Estimation Feature selection Machine learning Measurement error Random variables Regression analysis Regression calibration Regression models Statistical analysis Supervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In statistical analysis or supervised learning, classification has been an attractive topic. Typically, a main goal is to adopt predictors to characterize the primarily interested binary random variables. To model a binary response and predictors, parametric structures, such as logistic regression models or probit models, are perhaps commonly used approaches. However, due to the convenience of data collection, existence of non-informative variables as well as inevitability of measurement error in both responses and predictors become ubiquitous. The simultaneous appearance of these complex features make data analysis become challenging. To address those concerns, we propose a valid inferential method to deal with measurement error and handle variable selection simultaneously. Specifically, we focus on logistic regression or probit models, and propose estimating functions by incorporating corrected responses and predictors. After that, we develop the boosting procedure with error-eliminated estimating functions accommodated to do variable selection and estimation. To justify the proposed method, we examine the convergence of the boosting algorithm and rigorously establish the theoretical results. Through numerical studies, we find that the proposed method accurately retains informative predictors and gives precise estimators, and its performance is generally better than that without measurement error correction. The supplementary materials of this article, including proofs of theoretical results and computer code, are available online.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1061-8600 1537-2715
DOI:	10.1080/10618600.2023.2218428