Breast Carcinoma Prediction Through Integration of Machine Learning Models

Breast cancer poses a global health challenge, with high incidence and mortality rates. Early detection and precise diagnosis are crucial for patient prognosis. Machine learning (ML) models applied to mammary biopsy image data hold promise for achieving an efficient and accurate breast cancer diagno...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; pp. 134635 - 134650
Main Authors	Martinez-Licort, Rosmeri, de la Cruz Leon, Carlos, Agarwal, Deevyankar, Sahelices, Benjamin, de la Torre, Isabel, Pablo Miramontes-Gonzalez, Jose, Amoon, Mohammed
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Algorithms Analytical models Breast cancer Data models Datasets Diagnosis Ensemble learning Machine learning majority voting Performance evaluation Principal component analysis Public health Support vector machines Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Breast cancer poses a global health challenge, with high incidence and mortality rates. Early detection and precise diagnosis are crucial for patient prognosis. Machine learning (ML) models applied to mammary biopsy image data hold promise for achieving an efficient and accurate breast cancer diagnosis. In this study, we evaluated the performance of several ML algorithms, including Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB) and Support Vector Machine (SVM). We establish evaluation contexts by implementing data standardization and reducing the correlation between variables. Firstly, we select the best-performing parameters for each algorithm by building and evaluating the individual models. Then, we implement a combined model using weighted voting, where the weights of each model are determined based on its performance on the test dataset. The final model is constructed by combining the LR, RF and SVM models. We find that SVM is the best-performance individual model, so it has the highest weight in the final model. The final integrated model achieves an accuracy of 98%, a precision of 97%, a recall of 99%, an F1-score of 98% and an AUC of 0.98. Our weighted voting model compares favourably with the other models analysed. This approach demonstrates its efficiency and transparency in handling structured medical data. It is a prototype that will be refined and expanded to encompass larger real-world datasets.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3431998