Electronic Health Data in the Context of Patient Length-of-Stay Prediction: Using Generative Adversarial Nets for Synthetic Data Creation

While generative artificial intelligence has gained popularity (e.g., for the creation of images) it can also be used for the creation of synthetic tabular data. This bears great potential, especially for the healthcare industry where data is oftentimes scarce and underlies privacy restrictions. For...

Full description

Saved in:
Bibliographic Details
Published in2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE) pp. 1597 - 1604
Main Authors Bietsch, Dominik, Stahlbock, Robert, Vob, Stefan
Format Conference Proceeding
LanguageEnglish
Published IEEE 24.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:While generative artificial intelligence has gained popularity (e.g., for the creation of images) it can also be used for the creation of synthetic tabular data. This bears great potential, especially for the healthcare industry where data is oftentimes scarce and underlies privacy restrictions. For instance, the creation of synthetic electronic health records (EHR) promises to improve the usage of machine learning (ML) algorithms, which normally work with large amounts of data. This also applies for the prediction of the patient length of stay (LOS), a key measure for hospitals. Thereby, the LOS represents one of the core tools for decision-makers to plan the allocation of resources. This paper aims to add to the young research concerning the application of generative adversarial nets (GAN) on tabular EHR. The intention is to leverage the advantages of synthetic data for the prediction of the LOS in order to contribute to the efficiency -enhancing and cost-saving aspirations of hospitals and insurance companies. Therefore, the applicability of synthetic data generated by GANs as a proxy for scarce real-world EHR for the patient LOS multi-class classification task is examined. In this context the Conditional Tabular GAN (CTGAN) and the Copula GAN are selected. The CTGAN is found to be the superior model for the underlying use case. Nevertheless, the paper shows that there is still room for improvement when applying state-of-the-art GAN architectures to EHR.
DOI:10.1109/CSCE60160.2023.00262