Improving the Performance of Feature Selection Methods with Low-Sample-Size Data

Abstract Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, su...

Full description

Saved in:
Bibliographic Details
Published inComputer journal Vol. 66; no. 7; pp. 1664 - 1686
Main Authors Zheng, Wanwan, Jin, Mingzhe
Format Journal Article
LanguageEnglish
Published Oxford University Press 13.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroscience, bioinformatics and psychology). This study proposed a method to improve the performance of feature selection methods with ultra low-sample-size data, which is named feature selection based on data quality and variable training samples (QVT). Given that none of feature selection methods can perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. Furthermore, compared to the existing methods which tried to extract a stable feature subset for low-sample-size data by increasing the sample size or using more complicated algorithm, QVT tried to get improvement using the original data. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasibility of QVT; the results showed that using features selected by QVT is capable of achieving higher classification accuracy than using the explicit feature selection method, and significant differences exist.
ISSN:0010-4620
1460-2067
DOI:10.1093/comjnl/bxac033