Generating synthetic mixed-type tabular data by decoding samples from a latent-space: a case study in healthcare

Medical data are subject to privacy regulations, which severely limit AI specialists who wish to construct decision support systems for medicine. Large amounts of this data are tabular, indicating that they are organized into a table format, where patient records are represented in rows and measured...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 246; pp. 2254 - 2263
Main Authors Drapała, Jarosław, Świątek, Jerzy
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Medical data are subject to privacy regulations, which severely limit AI specialists who wish to construct decision support systems for medicine. Large amounts of this data are tabular, indicating that they are organized into a table format, where patient records are represented in rows and measured variables in columns. Furthermore, the variables come in different types—some are numerical, while others are categorical. In this work, we introduce a novel method for constructing generators of synthetic tabular data with mixed types. The key point of our approach is the explicit utilization of a latent space to represent the original data. A case study using real medical data is presented.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2024.09.569