Development of a medical-text parsing algorithm based on character adjacent probability distribution for Japanese radiology reports
The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts. Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morp...
Saved in:
Published in | Methods of information in medicine Vol. 47; no. 6; p. 513 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Germany
01.01.2008
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Summary: | The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts.
Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences.
We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci- 2(*)Ci- 1) was 1.00. For an example of anatomical location, the term "pulmonary hilum" was parsed as a tri-gram.
Retrieval of transitional probability will improve the accuracy of parsing compound medical terms. |
---|---|
ISSN: | 0026-1270 |
DOI: | 10.3414/me9127 |