An efficient hybrid filter-wrapper method based on improved Harris Hawks optimization for feature selection

Introduction: High-dimensional datasets often contain an abundance of features, many of which are irrelevant to the subject of interest. This issue is compounded by the frequently low number of samples and imbalanced class samples. These factors can negatively impact the performance of classificatio...

Full description

Saved in:
Bibliographic Details
Published inBioImpacts : BI Vol. 15; no. 1; pp. 30340 - 14
Main Authors Pirgazi, Jamshid, Pourhashem Kallehbasti, Mohammad Mehdi, Ghanbari Sorkhi, Ali, Kermani, Ali
Format Journal Article
LanguageEnglish
Published Iran Tabriz University of Medical Sciences 01.10.2024
Tabriz University of Medical Sciences (TUOMS Publishing Group)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Introduction: High-dimensional datasets often contain an abundance of features, many of which are irrelevant to the subject of interest. This issue is compounded by the frequently low number of samples and imbalanced class samples. These factors can negatively impact the performance of classification algorithms, necessitating feature selection before classification. The primary objective of feature selection algorithms is to identify a minimal subset of features that enables accurate classification. Methods: In this paper, we propose a two-stage hybrid method for the optimal selection of relevant features. In the first stage, a filter method is employed to assign weights to the features, facilitating the removal of redundant and irrelevant features and reducing the computational cost of classification algorithms. A subset of high-weight features is retained for further processing in the second stage. In this stage, an enhanced Harris Hawks Optimization algorithm and GRASP, augmented with crossover and mutation operators from genetic algorithms, are utilized based on the weights calculated in the first stage to identify the optimal feature set. Results: Experimental results demonstrate that the proposed algorithm successfully identifies the optimal subset of features. Conclusion: The two-stage hybrid method effectively selects the optimal subset of features, improving the performance of classification algorithms on high-dimensional datasets. This approach addresses the challenges posed by the abundance of features, low number of samples, and imbalanced class samples, demonstrating its potential for application in various fields.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2228-5652
2228-5660
DOI:10.34172/bi.30340