Tri-staged feature selection in multi-class heterogeneous datasets using memetic algorithm and cuckoo search optimization

•Proposes Tri-Staged Feature Selection (TFS) for multi-class heterogeneous datasets.•Initial features are selected using Kruskal Wallis Test.•Refinement of obtained features using Memetic Algorithm with local beam search.•Final feature set refinement using Cuckoo search algorithm for better classifi...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 209; p. 118286
Main Authors Devi Priya, R., Sivaraj, R., Anitha, N., Devisurya, V.
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 15.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Proposes Tri-Staged Feature Selection (TFS) for multi-class heterogeneous datasets.•Initial features are selected using Kruskal Wallis Test.•Refinement of obtained features using Memetic Algorithm with local beam search.•Final feature set refinement using Cuckoo search algorithm for better classification.•Experiments conducted on 12 real datasets for validation of proposed method. Classification algorithms and their preprocessing operations usually performs on feature selection on homogeneous or heterogeneous attributes, binary or multi-class labels separately. Only very few methods attempt to perform feature selection on datasets with heterogeneous multi-class attributes. In order to bridge this gap with better classification performance, the paper proposes a Tri-staged Feature Selection (TFS) methodology which performs (i) Feature selection using Kruskal Wallis test (ii) Refinement of feature selection using a new Memetic Algorithm with local beam search and genetic algorithm operations and (iii) Further refinement of feature selection using Cuckoo Search algorithm. Proper tradeoff between both exploration and exploitation is maintained in the proposed method. The experimental results on 12 datasets show that the proposed method is better than that of state-of-the-art methods used for feature selection in terms of multi-class accuracy, hamming loss, ranking loss, normalized coverage and convergence rate for multi-class heterogeneous datasets.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.118286