Explainable clinical coding with in-domain adapted transformers

[Display omitted] •Automatic clinical coding is crucial for extracting information from medical records.•Most of the existing computer-based methods for clinical coding act as “black boxes”.•We have developed two different approaches to tackle explainable clinical coding using transformers.•In-domai...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 139; p. 104323
Main Authors López-García, Guillermo, Jerez, José M., Ribelles, Nuria, Alba, Emilio, Veredas, Francisco J.
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.03.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Automatic clinical coding is crucial for extracting information from medical records.•Most of the existing computer-based methods for clinical coding act as “black boxes”.•We have developed two different approaches to tackle explainable clinical coding using transformers.•In-domain adapted transformers following our hierarchical-task strategy establish new state-of-the-art performance for explainable clinical coding tasks.•The proposed methodology can also be applied to address other medical tasks involving both the detection and normalization of clinical entities. Automatic clinical coding is a crucial task in the process of extracting relevant information from unstructured medical documents contained in Electronic Health Records (EHR). However, most of the existing computer-based methods for clinical coding act as “black boxes”, without giving a detailed description of the reasons for the clinical-coding assignments, which greatly limits their applicability to real-world medical scenarios. The objective of this study is to use transformer-based models to effectively tackle explainable clinical-coding. In this way, we require the models to perform the assignments of clinical codes to medical cases, but also to provide the reference in the text that justifies each coding assignment. We examine the performance of 3 transformer-based architectures on 3 different explainable clinical-coding tasks. For each transformer, we compare the performance of the original general-domain version with an in-domain version of the model adapted to the specificities of the medical domain. We address the explainable clinical-coding problem as a dual medical named entity recognition (MER) and medical named entity normalization (MEN) task. For this purpose, we have developed two different approaches, namely a multi-task and a hierarchical-task strategy. For each analyzed transformer, the clinical-domain version significantly outperforms the corresponding general domain model across the 3 explainable clinical-coding tasks analyzed in this study. Furthermore, the hierarchical-task approach yields a significantly superior performance than the multi-task strategy. Specifically, the combination of the hierarchical-task strategy with an ensemble approach leveraging the predictive capabilities of the 3 distinct clinical-domain transformers, yields the best obtained results, with f1-score, precision and recall of 0.852, 0.847 and 0.849 on the Cantemist-Norm task and 0.718, 0.566 and 0.633 on the CodiEsp-X task, respectively. By separately addressing the MER and MEN tasks, as well as by following a context-aware text-classification approach to tackle the MEN task, the hierarchical-task approach effectively reduces the intrinsic complexity of explainable clinical-coding, leading the transformers to establish new SOTA performances for the predictive tasks considered in this study. In addition, the proposed methodology has the potential to be applied to other clinical tasks that require both the recognition and normalization of medical entities.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2023.104323