Adapted large language models can outperform medical experts in clinical text summarization

Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range o...

Full description

Saved in:

Bibliographic Details
Published in	Nature medicine Vol. 30; no. 4; pp. 1134 - 1142
Main Authors	Van Veen, Dave, Van Uden, Cara, Blankemeier, Louis, Delbrouck, Jean-Benoit, Aali, Asad, Bluethgen, Christian, Pareek, Anuj, Polacin, Malgorzata, Reis, Eduardo Pontes, Seehofnerová, Anna, Rohatgi, Nidhi, Hosamani, Poonam, Collins, William, Ahuja, Neera, Langlotz, Curtis P., Hom, Jason, Gatidis, Sergios, Pauly, John, Chaudhari, Akshay S.
Format	Journal Article
Language	English
Published	New York Nature Publishing Group US 01.04.2024 Nature Publishing Group
Subjects	692/308/575 692/700 706/703/559 Adaptation Biomedical and Life Sciences Biomedicine Cancer Research Chatbots Documentation Electronic Health Records Electronic medical records Humans Infectious Diseases Language Large language models Medical personnel Metabolic Diseases Molecular Medicine Natural Language Processing Neurosciences Patients Performance assessment Physician-Patient Relations Radiology Semantics Summaries
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor–patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care. Comparative performance assessment of large language models identified ChatGPT-4 as the best-adapted model across a diverse set of clinical text summarization tasks, and it outperformed 10 medical experts in a reader study.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 D.V.V. collected data, developed code, ran experiments, designed reader studies, analyzed results, created figures and wrote the manuscript. All authors reviewed the manuscript and provided meaningful revisions and feedback. C.V.U., L.B. and J.B.D. provided technical advice, in addition to conducting qualitative analysis (C.V.U.), building infrastructure for the Azure API (L.B.) and implementing the MEDCON metric (J.B.). A.A. assisted in model fine-tuning. C.B., A.P., M.P., E.P.R. and A.S. participated in the reader study as radiologists. N.R., P.H., W.C., N.A. and J.H. participated in the reader study as hospitalists. C.P.L., J.P. and A.S.C. provided student funding. S.G. advised on study design, for which J.H. and J.P. provided additional feedback. J.P. and A.S.C. guided the project, with A.S.C. serving as principal investigator and advising on technical details and overall direction. No funders or third parties were involved in study design, analysis or writing. Author contributions
ISSN:	1078-8956 1546-170X 1546-170X
DOI:	10.1038/s41591-024-02855-5