A Neighborhood-Similarity-Based Imputation Algorithm for Healthcare Data Sets: A Comparative Study

The increasing computerisation of medical services has highlighted inconsistencies in the way in which patients’ historic medical data were recorded. Differences in process and practice between medical services and facilities have led to many incomplete and inaccurate medical histories being recorde...

Full description

Saved in:
Bibliographic Details
Published inElectronics (Basel) Vol. 12; no. 23; p. 4809
Main Authors Wilcox, Colin, Giagos, Vasileios, Djahel, Soufiene
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The increasing computerisation of medical services has highlighted inconsistencies in the way in which patients’ historic medical data were recorded. Differences in process and practice between medical services and facilities have led to many incomplete and inaccurate medical histories being recorded. To create a single point of truth going forward, it is necessary to correct these inconsistencies. A common way to do this has been to use imputation techniques to predict missing data values based on the known values in the data set. In this paper, we propose a neighborhood similarity measure-based imputation technique and analyze its achieved prediction accuracy in comparison with a number of traditional imputation methods using both an incomplete anonymized diabetes medical data set and a number of simulated data sets as the sources of our data. The aim is to determine whether any improvement could be made in the accuracy of predicting a diabetes diagnosis using the known outcomes of the diabetes patients’ data set. The obtained results have proven the effectiveness of our proposed approach compared to other state-of-the-art single-pass imputation techniques.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics12234809