Improving the Performance of Feature Selection Methods with Low-Sample-Size Data
Abstract Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, su...
Saved in:
Published in | Computer journal Vol. 66; no. 7; pp. 1664 - 1686 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Oxford University Press
13.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Abstract
Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroscience, bioinformatics and psychology). This study proposed a method to improve the performance of feature selection methods with ultra low-sample-size data, which is named feature selection based on data quality and variable training samples (QVT). Given that none of feature selection methods can perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. Furthermore, compared to the existing methods which tried to extract a stable feature subset for low-sample-size data by increasing the sample size or using more complicated algorithm, QVT tried to get improvement using the original data. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasibility of QVT; the results showed that using features selected by QVT is capable of achieving higher classification accuracy than using the explicit feature selection method, and significant differences exist. |
---|---|
ISSN: | 0010-4620 1460-2067 |
DOI: | 10.1093/comjnl/bxac033 |