Pruning techniques for parallel processing of reverse top-k queries
In this paper, we address the problem of processing reverse top- k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q , the reverse top- k query returns the subset of user preferences for which the query object belongs to the top- k results. A...
Saved in:
Published in | Distributed and parallel databases : an international journal Vol. 39; no. 1; pp. 169 - 199 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.03.2021
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we address the problem of processing reverse top-
k
queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object
q
, the reverse top-
k
query returns the subset of user preferences for which the query object belongs to the top-
k
results. Although recently the reverse top-
k
query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-
k
processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches. |
---|---|
ISSN: | 0926-8782 1573-7578 |
DOI: | 10.1007/s10619-020-07297-9 |