Boosting Search Performance Using Query Variations

Rank fusion is a powerful technique that allows multiple sources of information to be combined into a single result set. However, to date fusion has not been regarded as being cost-effective in cases where strict per-query efficiency guarantees are required, such as in web search. In this work we pr...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Benham, Rodger, Mackenzie, Joel, Moffat, Alistair, Culpepper, J Shane
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 10.11.2020
Subjects	Algorithms Computation Computer Science - Information Retrieval Efficiency Queries Ranking Retrieval System effectiveness
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Rank fusion is a powerful technique that allows multiple sources of information to be combined into a single result set. However, to date fusion has not been regarded as being cost-effective in cases where strict per-query efficiency guarantees are required, such as in web search. In this work we propose a novel solution to rank fusion by splitting the computation into two parts -- one phase that is carried out offline to generate pre-computed centroid answers for queries with broadly similar information needs, and then a second online phase that uses the corresponding topic centroid to compute a result page for each query. We explore efficiency improvements to classic fusion algorithms whose costs can be amortized as a pre-processing step, and can then be combined with re-ranking approaches to dramatically improve effectiveness in multi-stage retrieval systems with little efficiency overhead at query time. Experimental results using the ClueWeb12B collection and the UQV100 query variations demonstrate that centroid-based approaches allow improved retrieval effectiveness at little or no loss in query throughput or latency, and with reasonable pre-processing requirements. We additionally show that queries that do not match any of the pre-computed clusters can be accurately identified and efficiently processed in our proposed ranking pipeline.
ISSN:	2331-8422
DOI:	10.48550/arxiv.1811.06147