Assigning clinical codes with data-driven concept representation on Dutch clinical free text

[Display omitted] •Clinical code assignment in a non-english setting.•Unsupervised medical concept extraction with an unlabelled corpus.•Distributional semantics to expand concept definitions. Clinical codes are used for public reporting purposes, are fundamental to determining public financing for...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 69; pp. 118 - 127
Main Authors Scheurwegs, Elyne, Luyckx, Kim, Luyten, Léon, Goethals, Bart, Daelemans, Walter
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.05.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Clinical code assignment in a non-english setting.•Unsupervised medical concept extraction with an unlabelled corpus.•Distributional semantics to expand concept definitions. Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2017.04.007