Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index

For ultrahigh dimensional data, we propose a model-free marginal feature screening procedure, which can handle continuous, categorical and discrete response variables, based on the integral Pearson chi-square (IPC) index. The IPC index can be regarded as an extension of the AD index studied by He et...

Full description

Saved in:
Bibliographic Details
Published inJournal of statistical computation and simulation Vol. 92; no. 15; pp. 3222 - 3248
Main Authors Ma, Weidong, Xiao, Jingsong, Yang, Ying, Ye, Fei
Format Journal Article
LanguageEnglish
Published Abingdon Taylor & Francis 13.10.2022
Taylor & Francis Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:For ultrahigh dimensional data, we propose a model-free marginal feature screening procedure, which can handle continuous, categorical and discrete response variables, based on the integral Pearson chi-square (IPC) index. The IPC index can be regarded as an extension of the AD index studied by He et al. [A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Comput Stat Data Anal. 2019;137:155-169]. When the response variable is categorical, we extend He et al.'s work to the case of allowing a diverging number of response categories. However, the IPC index is difficult to estimate when the response is continuous. Thus we modify it and define the fused IPC index using the slice-and-fuse technique. Our feature screening procedure ranking the IPC or fused IPC index is robust to heavy-tailed features and outliers. The sure screening properties and the ranking consistency properties are established for both categorical and continuous responses under mild conditions. The finite sample performance of the proposed procedure is demonstrated through various numerical simulations and two real data applications.
ISSN:0094-9655
1563-5163
DOI:10.1080/00949655.2022.2062358