Automated ICD coding using extreme multi-label long text transformer-based models

Encouraged by the success of pretrained Transformer models in many natural language processing tasks, their use for International Classification of Diseases (ICD) coding tasks is now actively being explored. In this study, we investigated two existing Transformer-based models (PLM-ICD and XR-Transfo...

Full description

Saved in:

Bibliographic Details
Published in	Artificial intelligence in medicine Vol. 144; p. 102662
Main Authors	Liu, Leibo, Perez-Concha, Oscar, Nguyen, Anthony, Bennett, Vicki, Jorm, Louisa
Format	Journal Article
Language	English
Published	Elsevier B.V 01.10.2023
Subjects	Discharge summaries Extreme multi-label long text classification ICD coding MIMIC-II MIMIC-III Transformers MIMIC-II Transformers Discharge summaries ICD coding MIMIC-III Extreme multi-label long text classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Encouraged by the success of pretrained Transformer models in many natural language processing tasks, their use for International Classification of Diseases (ICD) coding tasks is now actively being explored. In this study, we investigated two existing Transformer-based models (PLM-ICD and XR-Transformer) and proposed a novel Transformer-based model (XR-LAT), aiming to address the extreme label set and long text classification challenges that are posed by automated ICD coding tasks. The Transformer-based model PLM-ICD, which currently holds the state-of-the-art (SOTA) performance on the ICD coding benchmark datasets MIMIC-III and MIMIC-II, was selected as our baseline model for further optimisation on both datasets. In addition, we extended the capabilities of the leading model in the general extreme multi-label text classification domain, XR-Transformer, to support longer sequences and trained it on both datasets. Moreover, we proposed a novel model, XR-LAT, which was also trained on both datasets. XR-LAT is a recursively trained model chain on a predefined hierarchical code tree with label-wise attention, knowledge transferring and dynamic negative sampling mechanisms. Our optimised PLM-ICD models, which were trained with longer total and chunk sequence lengths, significantly outperformed the current SOTA PLM-ICD models, and achieved the highest micro-F1 scores of 60.8 % and 50.9 % on MIMIC-III and MIMIC-II, respectively. The XR-Transformer model, although SOTA in the general domain, did not perform well across all metrics. The best XR-LAT based models obtained results that were competitive with the current SOTA PLM-ICD models, including improving the macro-AUC by 2.1 % and 5.1 % on MIMIC-III and MIMIC-II, respectively. Our optimised PLM-ICD models are the new SOTA models for automated ICD coding on both datasets, while our novel XR-LAT models perform competitively with the previous SOTA PLM-ICD models. •Automated ICD coding is an extreme multi-label long text classification problem.•Extreme large label set and long text are the two main challenges of ICD coding.•Three types of Transformer-based models were explored for ICD coding tasks.•The state-of-the-art model on extreme multi-label text classification in general domain was investigated for ICD coding.•A novel Transformer-based model with recursive training was proposed.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0933-3657 1873-2860
DOI:	10.1016/j.artmed.2023.102662