Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective
•Clinicians seek a reliable one-time diagnostic system for diabetes diagnosis without several blood sugar tests.•We framed working data pre-processing steps for feature selection and missing value imputation.•We developed a deep neural network model for accurately predicting diabetes mellitus and de...
Saved in:
Published in | Computer methods and programs in biomedicine Vol. 220; p. 106773 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Ireland
Elsevier B.V
01.06.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Clinicians seek a reliable one-time diagnostic system for diabetes diagnosis without several blood sugar tests.•We framed working data pre-processing steps for feature selection and missing value imputation.•We developed a deep neural network model for accurately predicting diabetes mellitus and determining the disease severity during diagnosis.•A train and test prediction accuracies of 99.01% and 97.25% are achieved with the PIMA Indian dataset, while 99.57% and 97.33% accuracies are achieved with the LMCH dataset.•A performance difference that ranges from 8.68% to 21.99% is achieved in comparison to state-of-the-art.•This work can be reproduced with the publicly available source code.
Diabetes mellitus is a metabolic disorder characterized by hyperglycemia, which results from the inadequacy of the body to secrete and respond to insulin. If not properly managed or diagnosed on time, diabetes can pose a risk to vital body organs such as the eyes, kidneys, nerves, heart, and blood vessels and so can be life-threatening. The many years of research in computational diagnosis of diabetes have pointed to machine learning to as a viable solution for the prediction of diabetes. However, the accuracy rate to date suggests that there is still much room for improvement. In this paper, we are proposing a machine learning framework for diabetes prediction and diagnosis using the PIMA Indian dataset and the laboratory of the Medical City Hospital (LMCH) diabetes dataset. We hypothesize that adopting feature selection and missing value imputation methods can scale up the performance of classification models in diabetes prediction and diagnosis.
In this paper, a robust framework for building a diabetes prediction model to aid in the clinical diagnosis of diabetes is proposed. The framework includes the adoption of Spearman correlation and polynomial regression for feature selection and missing value imputation, respectively, from a perspective that strengthens their performances. Further, different supervised machine learning models, the random forest (RF) model, support vector machine (SVM) model, and our designed twice-growth deep neural network (2GDNN) model are proposed for classification. The models are optimized by tuning the hyperparameters of the models using grid search and repeated stratified k-fold cross-validation and evaluated for their ability to scale to the prediction problem.
Through experiments on the PIMA Indian and LMCH diabetes datasets, precision, sensitivity, F1-score, train-accuracy, and test-accuracy scores of 97.34%, 97.24%, 97.26%, 99.01%, 97.25 and 97.28%, 97.33%, 97.27%, 99.57%, 97.33, are achieved with the proposed 2GDNN model, respectively.
The data preprocessing approaches and the classifiers with hyperparameter optimization proposed within the machine learning framework yield a robust machine learning model that outperforms state-of-the-art results in diabetes mellitus prediction and diagnosis. The source code for the models of the proposed machine learning framework has been made publicly available. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0169-2607 1872-7565 |
DOI: | 10.1016/j.cmpb.2022.106773 |