Feature ranking based consensus clustering for feature subset selection

Feature subset selection problem is an NP hard problem and there is a need for computationally efficient algorithms that find near optimal feature subsets which improve the performance of a classifier. Two major challenges for feature subset selection are high-dimensional data, that is, data with a...

Full description

Saved in:
Bibliographic Details
Published inApplied intelligence (Dordrecht, Netherlands) Vol. 54; no. 17-18; pp. 8154 - 8169
Main Authors D, Sandhya Rani, T, Sobha Rani, S, Durga Bhavani, G, Bala Krishna
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Feature subset selection problem is an NP hard problem and there is a need for computationally efficient algorithms that find near optimal feature subsets which improve the performance of a classifier. Two major challenges for feature subset selection are high-dimensional data, that is, data with a large number of features and large datasets. Scalability of the feature selection algorithms in terms of accuracy for high dimensional data and the time taken for large datasets are important issues. We propose a consensus clustering based approach to feature selection that addresses these issues. There exist many greedy feature ranking algorithms in the literature that are computationally efficient. Each algorithm assigns a different ranking order to the features. A consensus among these rankings may provide a feature ranking that performs well with respect to time as well as accuracy. The goal of this work is to propose efficient algorithms that work on small as well as large datasets. The contributions of this work include: i. A fast and scalable approach for feature selection Feature ranking based on consensus clustering (FRCC), has been designed using the available feature ranking algorithms from the literature. ii. A parallelizable version of FRCC, namely, Hybrid Feature Selection(HFS) , is proposed to address the feature reduction in large datasets. The implementation results show that FRCC clearly outperforms many recent algorithms in the literature on small as well as large dimensional data sets. HFS has been implemented on datasets with lakhs of instances and dimensionality in hundreds and thousands. HFS proves to be very effective in terms of feature reduction and accuracy in comparison to the results obtained by recent algorithms in the literature.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-024-05566-z