Secondary Use of Clinical Problem List Descriptions for Bi-Encoder Based ICD-10 Classification

Annotated language resources are essential for supervised machine learning methods. In the clinical domain, such data sets can boost use-case specific natural language processing services. In this work, we have analyzed a clinical problem list table consisting of millions of ICD-10 codes assigned to...

Full description

Saved in:
Bibliographic Details
Published inAMIA ... Annual Symposium proceedings Vol. 2024; p. 620
Main Authors Kreuzthaler, Markus, Pfeifer, Bastian, Schulz, Stefan
Format Journal Article
LanguageEnglish
Published United States 2024
Subjects
Online AccessGet full text
ISSN1942-597X
1559-4076

Cover

Loading…
More Information
Summary:Annotated language resources are essential for supervised machine learning methods. In the clinical domain, such data sets can boost use-case specific natural language processing services. In this work, we have analyzed a clinical problem list table consisting of millions of ICD-10 codes assigned to short problem list descriptions in German. We have investigated whether the given data forms a valuable resource within a secondary use case scenario for coding support. Our proposed methodology exploits an embedding-based k-NN classifier, which was evaluated based on its coding performance, leveraging the multilingual BERT based language model SapBERT-UMLS in comparison with medBERT.de, which is specifically tailored to medical and clinical language resources in German. Our approach reached a weighted F1-measure of 0.87 using SapBERT-UMLS and an F1-measure of 0.86 for medBERT.de. The approach revealed promising coding results when reusing annotated language resources out of clinical routine documentation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1942-597X
1559-4076