meds_reader: A fast and efficient EHR processing library
The growing demand for machine learning in healthcare requires processing increasingly large electronic health record (EHR) datasets, but existing pipelines are not computationally efficient or scalable. In this paper, we introduce meds_reader, an optimized Python package for efficient EHR data proc...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
12.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The growing demand for machine learning in healthcare requires processing
increasingly large electronic health record (EHR) datasets, but existing
pipelines are not computationally efficient or scalable. In this paper, we
introduce meds_reader, an optimized Python package for efficient EHR data
processing that is designed to take advantage of many intrinsic properties of
EHR data for improved speed. We then demonstrate the benefits of meds_reader by
reimplementing key components of two major EHR processing pipelines, achieving
10-100x improvements in memory, speed, and disk usage. The code for meds_reader
can be found at https://github.com/som-shahlab/meds_reader. |
---|---|
DOI: | 10.48550/arxiv.2409.09095 |