Feature selection based on artificial bee colony and gradient boosting decision tree

Data from many real-world applications can be high dimensional and features of such data are usually highly redundant. Identifying informative features has become an important step for data mining to not only circumvent the curse of dimensionality but to reduce the amount of data for processing. In...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 74; pp. 634 - 642
Main Authors Rao, Haidi, Shi, Xianzhang, Rodrigue, Ahoussou Kouassi, Feng, Juanjuan, Xia, Yingchun, Elhoseny, Mohamed, Yuan, Xiaohui, Gu, Lichuan
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data from many real-world applications can be high dimensional and features of such data are usually highly redundant. Identifying informative features has become an important step for data mining to not only circumvent the curse of dimensionality but to reduce the amount of data for processing. In this paper, we propose a novel feature selection method based on bee colony and gradient boosting decision tree aiming at addressing problems such as efficiency and informative quality of the selected features. Our method achieves global optimization of the inputs of the decision tree using the bee colony algorithm to identify the informative features. The method initializes the feature space spanned by the dataset. Less relevant features are suppressed according to the information they contribute to the decision making using an artificial bee colony algorithm. Experiments are conducted with two breast cancer datasets and six datasets from the public data repository. Experimental results demonstrate that the proposed method effectively reduces the dimensions of the dataset and achieves superior classification accuracy using the selected features. •A novel method for feature selection based on bee colony and decision tree.•The proposed method improves efficiency and informative quality of the selected features.•Experiments conducted with breast cancer datasets demonstrate superior performance.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2018.10.036