A Tent Lévy Flying Sparrow Search Algorithm for Wrapper-Based Feature Selection: A COVID-19 Case Study

The “Curse of Dimensionality” induced by the rapid development of information science might have a negative impact when dealing with big datasets, and it also makes the problems of symmetry and asymmetry increasingly prominent. Feature selection (FS) can eliminate irrelevant information in big data...

Full description

Saved in:

Bibliographic Details
Published in	Symmetry (Basel) Vol. 15; no. 2; p. 316
Main Authors	Yang, Qinwen, Gao, Yuelin, Song, Yanjie
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.01.2023
Subjects	Algorithms Analysis Asymmetry Case studies Classification Coronaviruses COVID-19 Data mining Datasets Feature selection Genetic algorithms Heuristic Information science Methods Neural networks Optimization techniques Parameter identification Performance evaluation Search algorithms Viral diseases China Germany
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The “Curse of Dimensionality” induced by the rapid development of information science might have a negative impact when dealing with big datasets, and it also makes the problems of symmetry and asymmetry increasingly prominent. Feature selection (FS) can eliminate irrelevant information in big data and improve accuracy. As a recently proposed algorithm, the Sparrow Search Algorithm (SSA) shows its advantages in the FS tasks because of its superior performance. However, SSA is more subject to the population’s poor diversity and falls into a local optimum. Regarding this issue, we propose a variant of the SSA called the Tent Lévy Flying Sparrow Search Algorithm (TFSSA) to select the best subset of features in the wrapper-based method for classification purposes. After the performance results are evaluated on the CEC2020 test suite, TFSSA is used to select the best feature combination to maximize classification accuracy and simultaneously minimize the number of selected features. To evaluate the proposed TFSSA, we have conducted experiments on twenty-one datasets from the UCI repository to compare with nine algorithms in the literature. Nine metrics are used to evaluate and compare these algorithms’ performance properly. Furthermore, the method is also used on the coronavirus disease (COVID-19) dataset, and its classification accuracy and the average number of feature selections are 93.47% and 2.1, respectively, reaching the best. The experimental results and comparison in all datasets demonstrate the effectiveness of our new algorithm, TFSSA, compared with other wrapper-based algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2073-8994 2073-8994
DOI:	10.3390/sym15020316