CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines

Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Pang, Chao, Jiang, Xinzhuo, Pavinkurve, Nishanth Parameshwar, Kalluri, Krishna S, Minto, Elise L, Patterson, Jason, Zhang, Linying, Hripcsak, George, Gürsoy, Gamze, Elhadad, Noémie, Natarajan, Karthik
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 06.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.
ISSN:2331-8422