Improving Shard Selection for Selective Search
The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for th...
Saved in:
Published in | Information Retrieval Technology Vol. 10648; pp. 29 - 41 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2017
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Online Access | Get full text |
ISBN | 3319701444 9783319701448 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-319-70145-5_3 |
Cover
Summary: | The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for the query, directly impacts Selective Search performance. We thus investigate three new approaches for the shard ranking problem, and three techniques to estimate how many of the top shards should be searched for a query (shard rank cutoff estimation). We learn a highly effective shard ranking model using the popular learning-to-rank framework. Another approach leverages the topical organization of the collection along with pseudo relevance feedback (PRF) to improve the search performance further. Empirical evaluation using a large collection demonstrates statistically significant improvements over strong baselines. Experiments also show that shard cutoff estimation is essential to balance search precision and recall. |
---|---|
ISBN: | 3319701444 9783319701448 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-70145-5_3 |