Comparative Methods for Addressing Imbalanced Datasets in Predicting Medical Appointment No-Shows
The efficiency and accessibility of healthcare delivery systems can be enhanced through solutions that minimize the impact of patient No-Shows for medical exams and appointments. The significant disparity between the "Show" and "No-Show" categories within the dataset can impair t...
Saved in:
Published in | 2024 L Latin American Computer Conference (CLEI) pp. 1 - 10 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
12.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The efficiency and accessibility of healthcare delivery systems can be enhanced through solutions that minimize the impact of patient No-Shows for medical exams and appointments. The significant disparity between the "Show" and "No-Show" categories within the dataset can impair the efficacy of predictive models, necessitating the employment of dataset balancing techniques before classifier training. This study evaluated various methods to address imbalances in datasets for predicting patient No-Shows. Techniques such as undersampling (Random Removal - RR, Remove Similar - RS, Remove Farthest - RF) and oversampling (Adaptive Synthetic Sampling - ADASYN) were applied to adjust the balance between Show and No-Show categories to ratios of 80-20%, 70-30%, 60-40%, and 50-50%, alongside a cost-sensitive learning approach. Four classifiers were deployed: Support Vector Machine (SVM), Naive Bayes, k-Nearest Neighbors (k-NN), and Random Forests. Additionally, a decision tree produced by the C4.5 algorithm was utilized for the cost-sensitive learning approach. The classifiers were evaluated using metrics such as Precision, Recall, F-measure, and AUC/ROC. Among the various methods tested, the RR combined with the k-NN classifier achieved the highest AUC/ROC value. However, due to the longer computational time of k-NN, the Random Forest classifier emerged as a more pragmatic choice when processing time is a critical consideration. |
---|---|
ISSN: | 2771-5752 |
DOI: | 10.1109/CLEI64178.2024.10700560 |