Improving the Performance of Feature Selection Methods with Low-Sample-Size Data

Abstract Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, su...

Full description

Saved in:

Bibliographic Details
Published in	Computer journal Vol. 66; no. 7; pp. 1664 - 1686
Main Authors	Zheng, Wanwan, Jin, Mingzhe
Format	Journal Article
Language	English
Published	Oxford University Press 13.07.2023
Subjects	data quality variable training samples feature selection predictive performance low-sample-size data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Abstract Feature selection refers to a critical preprocessing of machine learning to remove irrelevant and redundant data. According to feature selection methods, sufficient samples are usually required to select a reliable feature subset, especially considering the presence of outliers. However, sufficient samples cannot always be ensured in several real-world applications (e.g. neuroscience, bioinformatics and psychology). This study proposed a method to improve the performance of feature selection methods with ultra low-sample-size data, which is named feature selection based on data quality and variable training samples (QVT). Given that none of feature selection methods can perform optimally in all scenarios, QVT is primarily characterized by its versatility, because it can be implemented in any feature selection method. Furthermore, compared to the existing methods which tried to extract a stable feature subset for low-sample-size data by increasing the sample size or using more complicated algorithm, QVT tried to get improvement using the original data. An experiment was performed using 20 benchmark datasets, three feature selection methods and three classifiers to verify the feasibility of QVT; the results showed that using features selected by QVT is capable of achieving higher classification accuracy than using the explicit feature selection method, and significant differences exist.
ISSN:	0010-4620 1460-2067
DOI:	10.1093/comjnl/bxac033