Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets

Carcinoma is one of the scariest and most frequently occurring cancers nowadays among females. It affects nearly around 10% of females all over the world at some point in their lives. Although the cure for this cancer is currently obtainable, the treatment is not effective enough if the disease is n...

Full description

Saved in:
Bibliographic Details
Published inInternational Journal of Advanced Science Computing and Engineering Vol. 5; no. 1; pp. 15 - 24
Main Authors Hussain Sujon, Md. Arman, Mustafa, Hossen
Format Journal Article
LanguageEnglish
Published 16.01.2023
Online AccessGet full text

Cover

Loading…
More Information
Summary:Carcinoma is one of the scariest and most frequently occurring cancers nowadays among females. It affects nearly around 10% of females all over the world at some point in their lives. Although the cure for this cancer is currently obtainable, the treatment is not effective enough if the disease is not identified at the early stages. Generally, some contemporary medical tests: roentgenogram, breast ultrasound, biopsy, etc., are used for identifying breast cancer. As an alternative, researchers are exploring machine learning techniques for classifying tumours at different stages, e.g., benign and malignant. Classification and data processing strategies can be effective mechanisms for the prediction of cancer. In this paper, we analyze six classification models: Decision Tree, K Nearest Neighbours, Random Forest, Logistic Regression, Extra Trees, and Support Vector Machine on three different datasets. We applied simple principle component analysis (PCA) to reduce dimensions of the datasets. Experimental results show that Random Forest obtained the best accuracy, recall, and F1 score among the six classification techniques for all three datasets. We also find that data attributes and values are important for accurate classification.
ISSN:2714-7533
2714-7533
DOI:10.62527/ijasce.5.1.105