Improving Shard Selection for Selective Search

The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for th...

Full description

Saved in:
Bibliographic Details
Published inInformation Retrieval Technology Vol. 10648; pp. 29 - 41
Main Authors Chuang, Mon Shih, Kulkarni, Anagha
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2017
Springer International Publishing
SeriesLecture Notes in Computer Science
Online AccessGet full text
ISBN3319701444
9783319701448
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-70145-5_3

Cover

More Information
Summary:The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for the query, directly impacts Selective Search performance. We thus investigate three new approaches for the shard ranking problem, and three techniques to estimate how many of the top shards should be searched for a query (shard rank cutoff estimation). We learn a highly effective shard ranking model using the popular learning-to-rank framework. Another approach leverages the topical organization of the collection along with pseudo relevance feedback (PRF) to improve the search performance further. Empirical evaluation using a large collection demonstrates statistically significant improvements over strong baselines. Experiments also show that shard cutoff estimation is essential to balance search precision and recall.
ISBN:3319701444
9783319701448
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-70145-5_3