Application of support vector machine algorithm for early differential diagnosis of prostate cancer

Prostate cancer (PCa) symptoms are commonly confused with benign prostate hyperplasia (BPH), particularly in the early stages due to similarities between symptoms, and in some instances, underdiagnoses. Clinical methods have been utilized to diagnose PCa; however, at the full-blown stage, clinical m...

Full description

Saved in:
Bibliographic Details
Published inData science and management Vol. 6; no. 1; pp. 1 - 12
Main Authors Akinnuwesi, Boluwaji A., Olayanju, Kehinde A., Aribisala, Benjamin S., Fashoto, Stephen G., Mbunge, Elliot, Okpeku, Moses, Owate, Patrick
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.03.2023
KeAi Communications Co. Ltd
Subjects
Online AccessGet full text
ISSN2666-7649
2666-7649
DOI10.1016/j.dsm.2022.10.001

Cover

Loading…
More Information
Summary:Prostate cancer (PCa) symptoms are commonly confused with benign prostate hyperplasia (BPH), particularly in the early stages due to similarities between symptoms, and in some instances, underdiagnoses. Clinical methods have been utilized to diagnose PCa; however, at the full-blown stage, clinical methods usually present high risks of complicated side effects. Therefore, we proposed the use of support vector machine for early differential diagnosis of PCa (SVM-PCa-EDD). SVM was used to classify persons with and without PCa. We used the PCa dataset from the Kaggle Healthcare repository to develop and validate SVM model for classification. The PCa dataset consisted of 250 features and one class of features. Attributes considered in this study were age, body mass index (BMI), race, family history, obesity, trouble urinating, urine stream force, blood in semen, bone pain, and erectile dysfunction. The SVM-PCa-EDD was used for preprocessing the PCa dataset, specifically dealing with class imbalance, and for dimensionality reduction. After eliminating class imbalance, the area under the receiver operating characteristic (ROC) curve (AUC) of the logistic regression (LR) model trained with the downsampled dataset was 58.4%, whereas that of the AUC-ROC of LR trained with the class imbalance dataset was 54.3%. The SVM-PCa-EDD achieved 90% accuracy, 80% sensitivity, and 80% specificity. The validation of SVM-PCa-EDD using random forest and LR showed that SVM-PCa-EDD performed better in early differential diagnosis of PCa. The proposed model can assist medical experts in early diagnosis of PCa, particularly in resource-constrained healthcare settings and making further recommendations for PCa testing and treatment. •Prostate Cancer (PCa) was reported as one of the most frequently diagnosed cancer cases in male.•PCa symptoms are confusable with other prostate diseases and are not very obvious at the early stages but they become obvious at later stages.•Support Vector Machine is used for early differential diagnosis of PCa.•Model (SVM-PCa-EDD) performance shows 90% accurate, 80% sensitive and 80% specificity.•The proposed model can assist medical experts in early diagnosis of PCa, especially in resource-constraint healthcare settings and make some further recommendations for PCa testing and treatment.
ISSN:2666-7649
2666-7649
DOI:10.1016/j.dsm.2022.10.001