Pruning techniques for parallel processing of reverse top-k queries

In this paper, we address the problem of processing reverse top- k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q , the reverse top- k query returns the subset of user preferences for which the query object belongs to the top- k results. A...

Full description

Saved in:

Bibliographic Details
Published in	Distributed and parallel databases : an international journal Vol. 39; no. 1; pp. 169 - 199
Main Authors	Nikitopoulos, Panagiotis, Sfyris, Georgios A., Vlachou, Akrivi, Doulkeridis, Christos, Telelis, Orestis
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2021 Springer Nature B.V
Subjects	Algorithms Computer Science Data Structures Database Management Datasets Information Systems Applications (incl.Internet) Memory Structures Operating Systems Parallel processing Queries Query processing Parallel processing Distributed data Pruning techniques Reverse top-k query
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we address the problem of processing reverse top- k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q , the reverse top- k query returns the subset of user preferences for which the query object belongs to the top- k results. Although recently the reverse top- k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top- k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.
ISSN:	0926-8782 1573-7578
DOI:	10.1007/s10619-020-07297-9