A Generic-Driven Wrapper Embedded With Feature-Type-Aware Hybrid Bayesian Classifier for Breast Cancer Classification
Breast cancer is one of the most common cancers diagnosed in women. For preventive diagnosis, feature selection is an essential step to construct the breast cancer classifier. The features of a real breast cancer dataset are usually composed of discrete and continuous ones. Also, the Area Under the...
Saved in:
Published in | IEEE access Vol. 7; pp. 119931 - 119942 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Breast cancer is one of the most common cancers diagnosed in women. For preventive diagnosis, feature selection is an essential step to construct the breast cancer classifier. The features of a real breast cancer dataset are usually composed of discrete and continuous ones. Also, the Area Under the Curve (AUC) of the receiver operating characteristic receives more attention in such a medical field. The existing research work is insufficient to take into account both the hybrid trait of the features and the specific classification objective. We have proposed a wrapper method, i.e., a integrated framework in which Bayesian classifiers are embedded for the feature selection of breast cancer datasets. To deal with both the discrete features and the continuous features, we adopt the naive approach for the discrete features but the kernel probability density estimation for the continuous ones, respectively, which leads to feature-type-aware hybrid Bayesian classifiers. All the classifiers are fed with different feature subsets and evaluated by their AUC metrics as the fitness indexes. Thus, with the genetic algorithm, we can obtain a near optimal feature subset, which yields a good AUC metric with its corresponding classifiers. Moreover, the one-class F-score is used to help enhance the convergence of the algorithm. Experiments are done both with the continuous Wisconsin diagnostic breast cancer dataset and the real breast cancer dataset for Chinese women. The results prove that the proposed wrapper is feasible, accurate and efficient, compared with the related genetic algorithm based approaches. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2019.2932505 |