Pruning techniques for parallel processing of reverse top-k queries

In this paper, we address the problem of processing reverse top- k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q , the reverse top- k query returns the subset of user preferences for which the query object belongs to the top- k results. A...

Full description

Saved in:
Bibliographic Details
Published inDistributed and parallel databases : an international journal Vol. 39; no. 1; pp. 169 - 199
Main Authors Nikitopoulos, Panagiotis, Sfyris, Georgios A., Vlachou, Akrivi, Doulkeridis, Christos, Telelis, Orestis
Format Journal Article
LanguageEnglish
Published New York Springer US 01.03.2021
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we address the problem of processing reverse top- k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q , the reverse top- k query returns the subset of user preferences for which the query object belongs to the top- k results. Although recently the reverse top- k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top- k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.
ISSN:0926-8782
1573-7578
DOI:10.1007/s10619-020-07297-9