Contextual transformer sequence-based recognition network for medical examination reports

The automatic recognition of the medical examination report table (MERT) is receiving increasing attention in recent years as it is an essential step for intelligent healthcare and medical treatment. However, there are still some challenges in the table prediction when it is applied practically. In...

Full description

Saved in:
Bibliographic Details
Published inApplied intelligence (Dordrecht, Netherlands) Vol. 53; no. 14; pp. 17363 - 17380
Main Authors Wan, Honglin, Zhong, Zongfeng, Li, Tianping, Zhang, Huaxiang, Sun, Jiande
Format Journal Article
LanguageEnglish
Published New York Springer US 01.07.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The automatic recognition of the medical examination report table (MERT) is receiving increasing attention in recent years as it is an essential step for intelligent healthcare and medical treatment. However, there are still some challenges in the table prediction when it is applied practically. In this paper, a recognition network (CoT_SRN) for medical examination reports is proposed to improve the recognition accuracy of MERT structure and reconstruct the image into a spreadsheet. The network is based on contextual transformer sequence and consists of CoT encoder and SRN decoder. In the encoder, the CNN backbone is constructed to extract the MERT image structure features based on the Contextual Transformer (CoT) proposed in this paper. In the decoder, an attention head with gated recurrent unit (GRU) was used for feature sequence recognition to obtain the cell location and table structure represented by a structured language. In addition, MERT structure labels are defined as character-level HTML formats, which are added in the training of the table structure recognition. The proposed method can achieve competitive tree-edit-distance-based similarity (TEDS) scores on the English datasets, such as PubTabNet and SciTSR, and Chinese datasets, such as the Chinese medical document dataset (CMDD). It demonstrates that the Cot_SRN is helpful to preserve the good performance across multi-language MERT structure recognition. Additionally, the performance of the proposed method is verified on the practical examples with folds and small angle deflection. The experimental results show that the proposed method is promising in practical application.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-04420-4