Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report
Both English and Chinese ad-hoc information retrieval were investigated in this Tipster 3 project. Part of our objectives is to study the use of various term level and phrasal level evidence to improve retrieval accuracy. For short queries, we studied five term level techniques that together can lea...
Saved in:
Published in | Information retrieval (Boston) Vol. 3; no. 4; pp. 313 - 338 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Dordrecht
Springer Nature B.V
01.12.2000
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Both English and Chinese ad-hoc information retrieval were investigated in this Tipster 3 project. Part of our objectives is to study the use of various term level and phrasal level evidence to improve retrieval accuracy. For short queries, we studied five term level techniques that together can lead to good improvements over standard ad-hoc 2-stage retrieval for TREC5-8 experiments. For long queries, we studied the use of linguistic phrases to re-rank retrieval lists. Its effect is small but consistently positive. For Chinese IR, we investigated three simple representations for documents and queries: short-words, bigrams and characters. Both approximate short-word segmentation or bigrams, augmented with characters, give highly effective results. Accurate word segmentation appears not crucial for overall result of a query set. Character indexing by itself is not competitive. Additional improvements may be obtained using collection enrichment and combination of retrieval lists. Our PIRCS document-focused retrieval is also shown to have similarity with a simple language model approach to IR. [PUBLICATION ABSTRACT] |
---|---|
ISSN: | 1386-4564 1573-7659 |
DOI: | 10.1023/A:1009955715597 |