An improved breast cancer disease prediction system using ML and PCA

Computer-aided diagnosis (CAD) systems based on machine learning (ML) techniques have altered the field of medical research. The deployement of such models to classify breast cancer is one area of many where exactness has been the main preoccupation. CAD systems aim to reach the performance of train...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 83; no. 11; pp. 33785 - 33821
Main Authors	Laghmati, Sara, Hamida, Soufiane, Hicham, Khadija, Cherradi, Bouchaib, Tmiri, Amal
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2024 Springer Nature B.V
Subjects	Accuracy Breast cancer Classification Computer Communication Networks Computer Science Correlation coefficients Data Structures and Information Theory Datasets Machine learning Mammography Medical research Multimedia Information Systems Performance evaluation Recall Regression analysis Regression models Special Purpose and Application-Based Systems Supervised learning Track 2: Medical Applications of Multimedia CAD systems for breast cancer prediction; Machine Learning; Artificial Intelligence Feature selection Stacking with logistic regression PCA Majority voting Flask
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computer-aided diagnosis (CAD) systems based on machine learning (ML) techniques have altered the field of medical research. The deployement of such models to classify breast cancer is one area of many where exactness has been the main preoccupation. CAD systems aim to reach the performance of trained clinicians in identifying breast cancer at its early stages, thus optimizing the outcome for breast cancer patients while reducing the cost of treatment. This paper presents a supervised machine learning CAD system for breast cancer classification based on feature selection, PCA, grid search for hyperparameter tuning, and cross-validation. The system draws on seven ML classifiers ANN, k-NN, SVM, DT, RF, XGboost, and Adaboost. Two ensemble models were developed by concatenating the prediction of each ML model using Majority voting and stacking with Logistic Regression S-LR for the final prediction. The system's performance is evaluated by computing various evaluation metrics, mainly accuracy, specificity, precision, recall, Matthews Correlation Coefficient, Jaccard, and F1-score. To this end, the data sets used are Wisconsin and Mass mammography. The results indicate that the XGboost model achieved the highest recall of over 96% for the Mammographic Mass dataset. While for the WBCD, both the AdaBoost and the S-LR models outperformed the others with a Recall of 95.35%. The stacking with logistic regression ensemble model obtained the highest accuracies of 93.37% for the Mammographic Mass dataset and 97.37% for the WBCD. Accordingly, the proposed model can be suggested to assist in decision-making in classifying breast cancer tumors. Therefore, a Flask application using the S-LR model is developed.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-16874-w