Improving Shard Selection for Selective Search

The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for th...

Full description

Saved in:

Bibliographic Details
Published in	Information Retrieval Technology Vol. 10648; pp. 29 - 41
Main Authors	Chuang, Mon Shih, Kulkarni, Anagha
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2017 Springer International Publishing
Series	Lecture Notes in Computer Science
Online Access	Get full text
ISBN	3319701444 9783319701448
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-319-70145-5_3

Cover

More Information
Summary:	The Selective Search approach processes large document collections efficiently by partitioning the collection into topically homogeneous groups (shards), and searching only a few shards that are estimated to contain relevant documents for the query. The ability to identify the relevant shards for the query, directly impacts Selective Search performance. We thus investigate three new approaches for the shard ranking problem, and three techniques to estimate how many of the top shards should be searched for a query (shard rank cutoff estimation). We learn a highly effective shard ranking model using the popular learning-to-rank framework. Another approach leverages the topical organization of the collection along with pseudo relevance feedback (PRF) to improve the search performance further. Empirical evaluation using a large collection demonstrates statistically significant improvements over strong baselines. Experiments also show that shard cutoff estimation is essential to balance search precision and recall.
ISBN:	3319701444 9783319701448
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-70145-5_3