A Novel Framework for Fast Feature Selection Based on Multi-Stage Correlation Measures

Datasets with thousands of features represent a challenge for many of the existing learning methods because of the well known curse of dimensionality. Not only that, but the presence of irrelevant and redundant features on any dataset can degrade the performance of any model where training and infer...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning and knowledge extraction Vol. 4; no. 1; pp. 131 - 149
Main Authors	Garcia-Ramirez, Ivan-Alejandro, Calderon-Mora, Arturo, Mendez-Vazquez, Andres, Ortega-Cisneros, Susana, Reyes-Amezcua, Ivan
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.03.2022
Subjects	Algorithms Datasets Discriminant analysis Feature selection Machine learning Performance degradation python framework Random variables
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Datasets with thousands of features represent a challenge for many of the existing learning methods because of the well known curse of dimensionality. Not only that, but the presence of irrelevant and redundant features on any dataset can degrade the performance of any model where training and inference is attempted. In addition, in large datasets, the manual management of features tends to be impractical. Therefore, the increasing interest of developing frameworks for the automatic discovery and removal of useless features through the literature of Machine Learning. This is the reason why, in this paper, we propose a novel framework for selecting relevant features in supervised datasets based on a cascade of methods where speed and precision are in mind. This framework consists of a novel combination of Approximated and Simulate Annealing versions of the Maximal Information Coefficient (MIC) to generalize the simple linear relation between features. This process is performed in a series of steps by applying the MIC algorithms and cutoff strategies to remove irrelevant and redundant features. The framework is also designed to achieve a balance between accuracy and speed. To test the performance of the proposed framework, a series of experiments are conducted on a large battery of datasets from SPECTF Heart to Sonar data. The results show the balance of accuracy and speed that the proposed framework can achieve.
ISSN:	2504-4990 2504-4990
DOI:	10.3390/make4010007