A Novel Framework for Fast Feature Selection Based on Multi-Stage Correlation Measures
Datasets with thousands of features represent a challenge for many of the existing learning methods because of the well known curse of dimensionality. Not only that, but the presence of irrelevant and redundant features on any dataset can degrade the performance of any model where training and infer...
Saved in:
Published in | Machine learning and knowledge extraction Vol. 4; no. 1; pp. 131 - 149 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Datasets with thousands of features represent a challenge for many of the existing learning methods because of the well known curse of dimensionality. Not only that, but the presence of irrelevant and redundant features on any dataset can degrade the performance of any model where training and inference is attempted. In addition, in large datasets, the manual management of features tends to be impractical. Therefore, the increasing interest of developing frameworks for the automatic discovery and removal of useless features through the literature of Machine Learning. This is the reason why, in this paper, we propose a novel framework for selecting relevant features in supervised datasets based on a cascade of methods where speed and precision are in mind. This framework consists of a novel combination of Approximated and Simulate Annealing versions of the Maximal Information Coefficient (MIC) to generalize the simple linear relation between features. This process is performed in a series of steps by applying the MIC algorithms and cutoff strategies to remove irrelevant and redundant features. The framework is also designed to achieve a balance between accuracy and speed. To test the performance of the proposed framework, a series of experiments are conducted on a large battery of datasets from SPECTF Heart to Sonar data. The results show the balance of accuracy and speed that the proposed framework can achieve. |
---|---|
ISSN: | 2504-4990 2504-4990 |
DOI: | 10.3390/make4010007 |