Semi-supervised Learning - An Alternative to Traditional Breast Cancer Prediction

Breast cancer continues to create diagnostic challenges due to its overlapping and diverse characteristics, complicating the diagnosis. This study evaluates various cases for each stage of preprocessing and optimization techniques in Supervised Learning (SL) and Semi-Supervised Learning (SSL) to att...

Full description

Saved in:
Bibliographic Details
Published in2025 International Conference on Artificial Intelligence and Data Engineering (AIDE) pp. 319 - 325
Main Authors Manikanta, Suthari, Rasheed, Shaik Mohammed, Reddy, Jonnala Kowshik, Sankar, Seemakurthi Naga Surya Bhavani, Harshendra, Avula Venkata Sai, Srinivas, M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.02.2025
Subjects
Online AccessGet full text
DOI10.1109/AIDE64228.2025.10987549

Cover

Loading…
More Information
Summary:Breast cancer continues to create diagnostic challenges due to its overlapping and diverse characteristics, complicating the diagnosis. This study evaluates various cases for each stage of preprocessing and optimization techniques in Supervised Learning (SL) and Semi-Supervised Learning (SSL) to attain optimal predictive performance. Nine machine learning classifiers were utilized for both SL and SSL Models for training and testing on the Wisconsin Diagnostic Cancer Dataset using the following algorithms: 1) Logistic Regression (LR), 2) Gaussian Naive Bayes (GNB), 3) Linear-SVM, 4) RBF-SVM, 5) DT, 6) RF, 7) XGBoost, 8) Gradient Boosting (GB), and 9) K-Nearest Neighbors (KNN).This process incorporated feature extraction techniques, including Linear Discriminant Analysis (LDA), Feature Agglomeration, and feature selection through Regularization, employing both L2 and Bayesian methods. For hyperparameter optimization, a comprehensive approach Stratified K-fold and nested cross-validation was adopted and the Model has performed well by acheiving an accuracy of 99% compare to SL, which has showcased that SSL can be an alternative approach for sl when the labeled data is expensive or high computational resources are required.
DOI:10.1109/AIDE64228.2025.10987549