Free-text medical document retrieval via phrase-based vector space model
Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the...
Saved in:
Published in | Proceedings - AMIA Symposium pp. 489 - 493 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
American Medical Informatics Association
2002
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1531-605X |