A Multiple Feature Approach for Disorder Normalization in Clinical Notes

In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system（UMLS）concept unique identifier（CUI）. We develop a two-step method to acquire a list of candidate CUIs and their associated prefe...

Full description

Saved in:

Bibliographic Details
Published in	Wuhan University journal of natural sciences Vol. 21; no. 6; pp. 482 - 490
Main Author	Lü Chen CHEN Bo Lü Chaozhen QIU Likun JI Donghong
Format	Journal Article
Language	English
Published	Wuhan Wuhan University 01.12.2016 Springer Nature B.V
Subjects	Biomedical and Life Sciences Classification Computer Science Life Sciences Materials Science Mathematical analysis Similarity Levenshtein distance TP391 natural language processing disorder normalization semantic composition multiple features
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system（UMLS）concept unique identifier（CUI）. We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features（string features,ranking features,similarity features,and contextual features） are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.
Bibliography:	In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system（UMLS）concept unique identifier（CUI）. We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features（string features,ranking features,similarity features,and contextual features） are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline. 42-1405/N natural language processing disorder normalization Levenshtein distance semantic composition multiple features
ISSN:	1007-1202 1993-4998
DOI:	10.1007/s11859-016-1200-7