Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report

Both English and Chinese ad-hoc information retrieval were investigated in this Tipster 3 project. Part of our objectives is to study the use of various term level and phrasal level evidence to improve retrieval accuracy. For short queries, we studied five term level techniques that together can lea...

Full description

Saved in:
Bibliographic Details
Published inInformation retrieval (Boston) Vol. 3; no. 4; pp. 313 - 338
Main Author Kwok, K L
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Nature B.V 01.12.2000
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Both English and Chinese ad-hoc information retrieval were investigated in this Tipster 3 project. Part of our objectives is to study the use of various term level and phrasal level evidence to improve retrieval accuracy. For short queries, we studied five term level techniques that together can lead to good improvements over standard ad-hoc 2-stage retrieval for TREC5-8 experiments. For long queries, we studied the use of linguistic phrases to re-rank retrieval lists. Its effect is small but consistently positive. For Chinese IR, we investigated three simple representations for documents and queries: short-words, bigrams and characters. Both approximate short-word segmentation or bigrams, augmented with characters, give highly effective results. Accurate word segmentation appears not crucial for overall result of a query set. Character indexing by itself is not competitive. Additional improvements may be obtained using collection enrichment and combination of retrieval lists. Our PIRCS document-focused retrieval is also shown to have similarity with a simple language model approach to IR. [PUBLICATION ABSTRACT]
ISSN:1386-4564
1573-7659
DOI:10.1023/A:1009955715597