Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study

Breast cancer (BC) is considered the most common cause of cancer deaths in women. This study aims to identify BC early based on machine learning algorithms and features selection methods. The overall methodology of this work was modified based on knowledge data discovery (KDD) process, which include...

Full description

Saved in:

Bibliographic Details
Published in	Journal of ambient intelligence and humanized computing Vol. 12; no. 8; pp. 8585 - 8623
Main Author	El_Rahman, Sahar A.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2021 Springer Nature B.V
Subjects	Accuracy Algorithms Artificial Intelligence Breast cancer Classification Classifiers Comparative studies Computational Intelligence Datasets Decision trees Discriminant analysis Disease control Engineering Feature selection Females Genetic algorithms Kernel functions Lymphoma Machine learning Mammography Medical prognosis Medical research Metastasis Neural networks Original Research Prostate Radial basis function Researchers Robotics and Automation Skin cancer Support vector machines Survivability Tumors User Interfaces and Human Computer Interaction Uterus Womens health United States > US Egypt Feature selection Machine learning algorithms WPBC Breast cancer (BC) WDBC Classification BI-RADS WBC
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Breast cancer (BC) is considered the most common cause of cancer deaths in women. This study aims to identify BC early based on machine learning algorithms and features selection methods. The overall methodology of this work was modified based on knowledge data discovery (KDD) process, which include four datasets, preprocessing phase (data cleaning, data splitting to training and testing sets), processing phase (feature selection, k-folds validation, and classification) and finally model evaluation. This paper presents a comparison between different classifiers such as decision tree (DT), random forest (RF), logistic regression (LR), Naïve Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM). Four different breast cancer datasets (Wisconsin prognosis breast cancer (WPBC), Wisconsin diagnosis breast cancer (WDBC), Wisconsin Breast Cancer (WBC), and Mammographic Mass Dataset (MM-Dataset) based on BI-RADS findings) are conducted in the experiments. The proposed models were evaluated by utilizing classification accuracy and confusion matrix. The experimental results indicate that the classification based on RF technique with the Genetic Algorithm (GA) as a feature selection method is better than the other classifiers with an accuracy value 96.82% using WBC dataset. In WDBC dataset, the results indicate that the classification utilizing C-SVM technique with the applied kernel function RBF (Radial Basis Function) is superior to the other classifiers with an accuracy value 99.04%. In WPBC dataset, the results indicate that the classification using RF technique with recursive feature elimination (RFE) as a feature selection method is better than the other classifiers with an accuracy value 74.13%. In MM-Dataset, the results indicate that the classification using DT technique is better than the other classifiers with an accuracy value 83.74%. The findings indicate that the proposed models are effective by comparing with others existing models.
ISSN:	1868-5137 1868-5145
DOI:	10.1007/s12652-020-02590-y