Filtered document retrieval with frequency-sorted indexes

Ranking techniques are effective at finding answers in document collections but can be expensive to evaluate. An evaluation technique is proposed that uses early recognition of which documents are likely to be highly ranked to reduce costs; for the test data, queries are evaluated in 2% of the memor...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Society for Information Science and Technology Vol. 47; no. 10; p. 749
Main Authors Persin, Michael, Zobel, Justin, Sacks-Davis, Ron
Format Journal Article
LanguageEnglish
Published Hoboken Wiley Periodicals Inc 01.10.1996
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Ranking techniques are effective at finding answers in document collections but can be expensive to evaluate. An evaluation technique is proposed that uses early recognition of which documents are likely to be highly ranked to reduce costs; for the test data, queries are evaluated in 2% of the memory of the standard implementation without degradation in retrieval effectiveness. Cpu time and disk traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. The principle of the index design is that inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces cpu time and disk traffic to around 1/3 of the original requirement. It is shown that frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed.
ISSN:2330-1635
2330-1643