Generating synthetic mixed-type tabular data by decoding samples from a latent-space: a case study in healthcare
Medical data are subject to privacy regulations, which severely limit AI specialists who wish to construct decision support systems for medicine. Large amounts of this data are tabular, indicating that they are organized into a table format, where patient records are represented in rows and measured...
Saved in:
Published in | Procedia computer science Vol. 246; pp. 2254 - 2263 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Medical data are subject to privacy regulations, which severely limit AI specialists who wish to construct decision support systems for medicine. Large amounts of this data are tabular, indicating that they are organized into a table format, where patient records are represented in rows and measured variables in columns. Furthermore, the variables come in different types—some are numerical, while others are categorical. In this work, we introduce a novel method for constructing generators of synthetic tabular data with mixed types. The key point of our approach is the explicit utilization of a latent space to represent the original data. A case study using real medical data is presented. |
---|---|
ISSN: | 1877-0509 1877-0509 |
DOI: | 10.1016/j.procs.2024.09.569 |