Clinical Information Extraction with Large Language Models: A Case Study on Organ Procurement

Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present...

Full description

Saved in:
Bibliographic Details
Published inAMIA ... Annual Symposium proceedings Vol. 2024; p. 115
Main Authors Adam, Hammaad, Lin, Junjing, Lin, Jianchang, Keenan, Hillary, Wilson, Ashia, Ghassemi, Marzyeh
Format Journal Article
LanguageEnglish
Published United States 2024
Subjects
Online AccessGet full text
ISSN1942-597X
1559-4076

Cover

Loading…
Abstract Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present a case study on organ procurement that evaluates the ability of LLMs to extract numeric data from clinical text. We first describe our LLM-based approach, introducing a prompting strategy for numeric extraction and novel heuristics to combat hallucination. We validate our approach on a hand-annotated set of 298 notes, demonstrating that it has high accuracy, precision and recall. We then highlight the value of our approach for downstream data analysis using a corpus of 43,719 notes on 14,342 potential organ donors. This case study is a key component of an ongoing collaboration that aims to make data on organ procurement publicly available for informatics research.
AbstractList Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present a case study on organ procurement that evaluates the ability of LLMs to extract numeric data from clinical text. We first describe our LLM-based approach, introducing a prompting strategy for numeric extraction and novel heuristics to combat hallucination. We validate our approach on a hand-annotated set of 298 notes, demonstrating that it has high accuracy, precision and recall. We then highlight the value of our approach for downstream data analysis using a corpus of 43,719 notes on 14,342 potential organ donors. This case study is a key component of an ongoing collaboration that aims to make data on organ procurement publicly available for informatics research.
Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present a case study on organ procurement that evaluates the ability of LLMs to extract numeric data from clinical text. We first describe our LLM-based approach, introducing a prompting strategy for numeric extraction and novel heuristics to combat hallucination. We validate our approach on a hand-annotated set of 298 notes, demonstrating that it has high accuracy, precision and recall. We then highlight the value of our approach for downstream data analysis using a corpus of 43,719 notes on 14,342 potential organ donors. This case study is a key component of an ongoing collaboration that aims to make data on organ procurement publicly available for informatics research.Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present a case study on organ procurement that evaluates the ability of LLMs to extract numeric data from clinical text. We first describe our LLM-based approach, introducing a prompting strategy for numeric extraction and novel heuristics to combat hallucination. We validate our approach on a hand-annotated set of 298 notes, demonstrating that it has high accuracy, precision and recall. We then highlight the value of our approach for downstream data analysis using a corpus of 43,719 notes on 14,342 potential organ donors. This case study is a key component of an ongoing collaboration that aims to make data on organ procurement publicly available for informatics research.
Author Adam, Hammaad
Ghassemi, Marzyeh
Lin, Junjing
Keenan, Hillary
Wilson, Ashia
Lin, Jianchang
Author_xml – sequence: 1
  givenname: Hammaad
  surname: Adam
  fullname: Adam, Hammaad
  organization: Massachusetts Institute of Technology, Cambridge, MA, USA
– sequence: 2
  givenname: Junjing
  surname: Lin
  fullname: Lin, Junjing
  organization: Takeda Pharmaceuticals, Cambridge, MA, USA
– sequence: 3
  givenname: Jianchang
  surname: Lin
  fullname: Lin, Jianchang
  organization: Takeda Pharmaceuticals, Cambridge, MA, USA
– sequence: 4
  givenname: Hillary
  surname: Keenan
  fullname: Keenan, Hillary
  organization: Takeda Pharmaceuticals, Cambridge, MA, USA
– sequence: 5
  givenname: Ashia
  surname: Wilson
  fullname: Wilson, Ashia
  organization: Massachusetts Institute of Technology, Cambridge, MA, USA
– sequence: 6
  givenname: Marzyeh
  surname: Ghassemi
  fullname: Ghassemi, Marzyeh
  organization: Vector Institute, Toronto, Ontario, Canada
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40417525$$D View this record in MEDLINE/PubMed
BookMark eNo1kF1LwzAUhoNM3If-BcmlN4UkzUfr3SjTDSYTVPBGSr46K206kxTdvzfovDnvA-fhwHnnYOIGZ8_ADDNWZhQJPklcUpKxUrxOwTyED4SoYAW_AFOKKBaMsBl4q7rWtVp2cOOawfcytoODq-_opf7Frza-w630e5um248ywcNgbBdu4RJWMlj4FEdzhMnd-b108NEPevS2ty5egvNGdsFenXIBXu5Wz9U62-7uN9Vymx0wKWLGS1uUyCgleUEEFwbThgqpDSY6hSINVjqtNE7PCK0KxJThEhW6EWVe4HwBbv7uHvzwOdoQ674N2naddHYYQ50TJATKOedJvT6po-qtqQ--7aU_1v-V5D8dkF_x
ContentType Journal Article
Copyright 2024 AMIA - All rights reserved.
Copyright_xml – notice: 2024 AMIA - All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1559-4076
ExternalDocumentID 40417525
Genre Journal Article
GroupedDBID 2WC
53G
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BAWUL
CGR
CUY
CVF
DIK
E3Z
ECM
EIF
GX1
HYE
NPM
OK1
RPM
WOQ
7X8
ID FETCH-LOGICAL-p128t-69e890dbba682767d14f47acd12c7acb2f1bc682c15977cb805bd6a08cf793813
ISSN 1942-597X
IngestDate Mon May 26 17:05:14 EDT 2025
Thu May 29 04:59:28 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License 2024 AMIA - All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p128t-69e890dbba682767d14f47acd12c7acb2f1bc682c15977cb805bd6a08cf793813
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 40417525
PQID 3207703666
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3207703666
pubmed_primary_40417525
PublicationCentury 2000
PublicationDate 2024-00-00
20240101
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 2024-00-00
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle AMIA ... Annual Symposium proceedings
PublicationTitleAlternate AMIA Annu Symp Proc
PublicationYear 2024
SSID ssj0047586
Score 2.321852
Snippet Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 115
SubjectTerms Humans
Information Storage and Retrieval - methods
Large Language Models
Natural Language Processing
Tissue and Organ Procurement
Title Clinical Information Extraction with Large Language Models: A Case Study on Organ Procurement
URI https://www.ncbi.nlm.nih.gov/pubmed/40417525
https://www.proquest.com/docview/3207703666
Volume 2024
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ3Pa9swFMdF6WH0Uvaza7sNDXYLNpYty9JupnRkY9llLeQygmTLh7G4oXWg3V-_9_TDMaOFdYcoQRI-6BOkp-f3fY-QD7wqmJWySFTeioTLsk2U0NDIqlG8hI9GgfPim5hf8i_Lcrmr4ujUJYNJm9_36kr-hyr0AVdUyT6C7PhQ6IDfwBdaIAztPzE-i7LGICpyLM9vh-tQANw5Wb9irDe03i_pip_9ugmCdDjCXCThHb4zcLJMpxwITsOp4VovPtezNE1nISH_97s1xntt17PdETia53Xr_2boFte6HYN-QuWvbf8zHpiTXvSuoPd6dwTYUDt5jpWRQrhycFB4RXRqw3ZaKrih-govcb8d5_gtk3k55wTXZu148YyDbeNl0X_lxI5DmLQA9nTUeC_HyB4OVyCsTBUnPXx3cDbExVNyGIx_WnuSz8ie7Z-TJ4sQ3vCC_IhA6QQo3QGlCJQ6oDQCpR7oR1pTxEkdTgpzHU46wfmSXH46vzibJ6H-RbIBq2FIhLJSZa0xWsi8ElXLeMcr3bQsb-DL5B0zDQw1DJMINkZmpWmFzmTTwa4rWfGK7PdXvX1NaMetFpnmrFOSM9NpU2W6MoqrotBwZTkm7-MarWB_wZdGurdX25tVkWcVJmkT4pgc-cVbbXwilFVc4ZMHR07JAeL2Pqs3ZH-43tq3YMUN5p1j9gfdxkzJ
linkProvider Geneva Foundation for Medical Education and Research
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clinical+Information+Extraction+with+Large+Language+Models%3A+A+Case+Study+on+Organ+Procurement&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Adam%2C+Hammaad&rft.au=Lin%2C+Junjing&rft.au=Lin%2C+Jianchang&rft.au=Keenan%2C+Hillary&rft.date=2024&rft.eissn=1559-4076&rft.volume=2024&rft.spage=115&rft_id=info%3Apmid%2F40417525&rft.externalDocID=40417525
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon