Enhanced ICD-10 code assignment of clinical texts: A summarization-based approach

Assigning International Classification of Diseases (ICD) codes to clinical texts is a common and crucial practice in patient classification, hospital management, and further statistics analysis. Current auto-coding methods mainly transfer this task to a multi-label classification problem. Such solut...

Full description

Saved in:
Bibliographic Details
Published inArtificial intelligence in medicine Vol. 156; p. 102967
Main Authors Sun, Yaoqian, Sang, Lei, Wu, Dan, He, Shilin, Chen, Yani, Duan, Huilong, Chen, Han, Lu, Xudong
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Assigning International Classification of Diseases (ICD) codes to clinical texts is a common and crucial practice in patient classification, hospital management, and further statistics analysis. Current auto-coding methods mainly transfer this task to a multi-label classification problem. Such solutions are suffering from high-dimensional mapping space and excessive redundant information in long clinical texts. To alleviate such a situation, we introduce text summarization methods to the ICD coding regime and apply text matching to select ICD codes. We focus on the tenth revision of the ICD (ICD-10) coding and design a novel summarization-based approach (SuM) with an end-to-end strategy to efficiently assign ICD-10 code to clinical texts. In this approach, a knowledge-guided pointer network is purposed to distill and summarize key information in clinical texts precisely. Then a matching model with matching-aggregation architecture follows to align the summary result with code, tuning the one-vs-all scenario to one-vs-one matching so that the large-label-space obstacle laid in classification approaches would be avoided. The 12,788 ICD-10 coded discharge summaries from a Chinese hospital were collected to evaluate the proposed approach. Compared with existing methods, the purposed model achieves the greatest coding results with Micro AUC of 0.9548, MRR@10 of 0.7977, Precision@10 of 0.0944, and Recall@10 of 0.9439 for the TOP-50 Dataset. Results on the FULL-Dataset remain consistent. Also, the proposed knowledge encoder and applied end-to-end strategy are proven to facilitate the whole model to gain efficacy in selecting the most suitable code. The proposed automatic ICD-10 code assignment approach via text summarization can effectively capture critical messages in long clinical texts and improve the performance of ICD-10 coding of clinical texts. •The text summarization method is used for ICD-10 coding with better distillation of key information in long clinical texts.•With the guidance of the main diagnosis, the summary of the long clinical text is precise and applicable for code matching.•The text matching method facilitates ICD-10 coding with interactive features and the one-vs-one operation.•A novel end-to-end model of the summarization module and text matching module is proposed.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0933-3657
1873-2860
1873-2860
DOI:10.1016/j.artmed.2024.102967