A Generic-Driven Wrapper Embedded With Feature-Type-Aware Hybrid Bayesian Classifier for Breast Cancer Classification

Breast cancer is one of the most common cancers diagnosed in women. For preventive diagnosis, feature selection is an essential step to construct the breast cancer classifier. The features of a real breast cancer dataset are usually composed of discrete and continuous ones. Also, the Area Under the...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 7; pp. 119931 - 119942
Main Authors Wuniri, Qiqige, Huangfu, Wei, Liu, Yaxi, Lin, Xiaoli, Liu, Liyuan, Yu, Zhigang
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Breast cancer is one of the most common cancers diagnosed in women. For preventive diagnosis, feature selection is an essential step to construct the breast cancer classifier. The features of a real breast cancer dataset are usually composed of discrete and continuous ones. Also, the Area Under the Curve (AUC) of the receiver operating characteristic receives more attention in such a medical field. The existing research work is insufficient to take into account both the hybrid trait of the features and the specific classification objective. We have proposed a wrapper method, i.e., a integrated framework in which Bayesian classifiers are embedded for the feature selection of breast cancer datasets. To deal with both the discrete features and the continuous features, we adopt the naive approach for the discrete features but the kernel probability density estimation for the continuous ones, respectively, which leads to feature-type-aware hybrid Bayesian classifiers. All the classifiers are fed with different feature subsets and evaluated by their AUC metrics as the fitness indexes. Thus, with the genetic algorithm, we can obtain a near optimal feature subset, which yields a good AUC metric with its corresponding classifiers. Moreover, the one-class F-score is used to help enhance the convergence of the algorithm. Experiments are done both with the continuous Wisconsin diagnostic breast cancer dataset and the real breast cancer dataset for Chinese women. The results prove that the proposed wrapper is feasible, accurate and efficient, compared with the related genetic algorithm based approaches.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2932505