Light-GBM based minority oversampling model using biomedical data analysis for breast cancer classification

The yearly incidence of breast cancer, which is already among the highest of all cancers, is steadily rising. Without surgical biopsy, predicting the benign or malignant nature of tumors by analyzing various indicators of cell nuclei can effectively assist doctors in diagnosis and reduce patients’ s...

Full description

Saved in:
Bibliographic Details
Published inDiscover applied sciences Vol. 7; no. 7; pp. 1 - 28
Main Authors Soni, Mukesh, Bhatt, Mohammed Wasim, Ofori-Amanfo, Paul
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 11.07.2025
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The yearly incidence of breast cancer, which is already among the highest of all cancers, is steadily rising. Without surgical biopsy, predicting the benign or malignant nature of tumors by analyzing various indicators of cell nuclei can effectively assist doctors in diagnosis and reduce patients’ suffering. Research continuously shows that LightGBM hybrid models outperform conventional classifiers in terms of accuracy. With improvements in accuracy, speed, and efficiency, LightGBM-based hybrid models frequently outperform baseline or standard classifiers. A model for the identification of breast cancer based on the lightweight gradient boosting machine (GBM) algorithm. To address the problem of skewed diagnostic data for breast cancer, the Borderline-SMOTE method is used. In the Sparrow Search Algorithm (SSA), piecewise linear chaotic map (PWLCM), novel inertia weights, and a new longitudinal-lateral crossover algorithm are introduced for improvement, followed by the application of the improved SSA algorithm for automatic parameter optimization of Light-GBM. Due to Light-GBM’s sensitivity to noise, an OVR-Jacobian regularization method is proposed for denoising. It improved ensemble model strength and successively used for breast cancer diagnosis. The suggested ensemble model achieves better performance than standard models in terms of mean square error, according to the experimental data, determination coefficient, and cross-validation score, demonstrating its better diagnostic performance. Article Highlights This study suggests a novel SSA improved by multiple means to optimize the parameters of the Light-GBM model, reducing parameter search time and improving the overall model accuracy. This paper introduces a new OVR-Jacobian regularization method to eliminate the sensitivity of the original proposed model to noise and the problem of uncontrollable complexity. This method effectively prevents overfitting and balances the model between complexity and diagnostic performance.
ISSN:3004-9261
3004-9261
DOI:10.1007/s42452-025-07390-7