Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index
For ultrahigh dimensional data, we propose a model-free marginal feature screening procedure, which can handle continuous, categorical and discrete response variables, based on the integral Pearson chi-square (IPC) index. The IPC index can be regarded as an extension of the AD index studied by He et...
Saved in:
Published in | Journal of statistical computation and simulation Vol. 92; no. 15; pp. 3222 - 3248 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Abingdon
Taylor & Francis
13.10.2022
Taylor & Francis Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | For ultrahigh dimensional data, we propose a model-free marginal feature screening procedure, which can handle continuous, categorical and discrete response variables, based on the integral Pearson chi-square (IPC) index. The IPC index can be regarded as an extension of the AD index studied by He et al. [A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Comput Stat Data Anal. 2019;137:155-169]. When the response variable is categorical, we extend He et al.'s work to the case of allowing a diverging number of response categories. However, the IPC index is difficult to estimate when the response is continuous. Thus we modify it and define the fused IPC index using the slice-and-fuse technique. Our feature screening procedure ranking the IPC or fused IPC index is robust to heavy-tailed features and outliers. The sure screening properties and the ranking consistency properties are established for both categorical and continuous responses under mild conditions. The finite sample performance of the proposed procedure is demonstrated through various numerical simulations and two real data applications. |
---|---|
ISSN: | 0094-9655 1563-5163 |
DOI: | 10.1080/00949655.2022.2062358 |