A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports
Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal locali...
Saved in:
Published in | AMIA Summits on Translational Science proceedings Vol. 2025; pp. 598 - 606 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
American Medical Informatics Association
2025
|
Online Access | Get full text |
Cover
Loading…
Summary: | Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal localization of events in structured form. We present a system that transforms case reports into textual time series-structured pairs of textual events and timestamps. We contrast manual and large language model (LLM) annotations (n=320 and n=390 respectively) of ten randomly-sampled PubMed open-access (PMOA) case reports (N=152,974) and assess inter-LLM agreement (n=3,103 N=93). We find that the LLM models have moderate event recall (O1-preview: 0.80) but high temporal concordance among identified events (O1-preview: 0.95). By establishing the task, annotation, and assessment systems, and by demonstrating high concordance, this work may serve as a benchmark for leveraging the PMOA corpus for temporal analytics. Code is available at:https://github.com/jcweiss2/LLM-Timeline-PMOA/. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2153-4063 2153-4063 |