Development of a medical-text parsing algorithm based on character adjacent probability distribution for Japanese radiology reports

The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts. Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morp...

Full description

Saved in:
Bibliographic Details
Published inMethods of information in medicine Vol. 47; no. 6; p. 513
Main Authors Nishimoto, N, Terae, S, Uesugi, M, Ogasawara, K, Sakurai, T
Format Journal Article
LanguageEnglish
Published Germany 01.01.2008
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts. Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences. We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci- 2(*)Ci- 1) was 1.00. For an example of anatomical location, the term "pulmonary hilum" was parsed as a tri-gram. Retrieval of transitional probability will improve the accuracy of parsing compound medical terms.
ISSN:0026-1270
DOI:10.3414/me9127