Hardware-Conscious Hash-Joins on GPUs

Traditionally, analytical database engines have used task parallelism provided by modern multisocket multicore CPUs for scaling query execution. Over the past few years, GPUs have started gaining traction as accelerators for processing analytical queries due to their massively data-parallel nature a...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE 35th International Conference on Data Engineering (ICDE) pp. 698 - 709
Main Authors	Sioulas, Panagiotis, Chrysogelos, Periklis, Karpathiotakis, Manos, Appuswamy, Raja, Ailamaki, Anastasia
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2019
Subjects	analytics Bandwidth Engines GPU Graphics processing units Hardware Instruction sets join Partitioning algorithms Performance evaluation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Traditionally, analytical database engines have used task parallelism provided by modern multisocket multicore CPUs for scaling query execution. Over the past few years, GPUs have started gaining traction as accelerators for processing analytical queries due to their massively data-parallel nature and high memory bandwidth. Recent work on designing join algorithms for CPUs has shown that carefully tuned join implementations that exploit underlying hardware can outperform naive, hardware-oblivious counterparts and provide excellent performance on modern multicore servers. However, there has been no such systematic analysis of hardware-conscious join algorithms for GPUs that systematically explores the dimensions of partitioning (partitioned versus non-partitioned joins), data location (data fitting and not fitting in GPU device memory), and access pattern (skewed versus uniform). In this paper, we present the design and implementation of a family of novel, partitioning-based GPU-join algorithms that are tuned to exploit various GPU hardware characteristics for working around the two main limitations of GPUs-limited memory capacity and slow PCIe interface. Using a thorough evaluation, we show that: i) hardware-consciousness plays a key role in GPU joins similar to CPU joins and our join algorithms can process 1 Billion tuples/second even if no data is GPU resident, ii) radix partitioning-based GPU joins that are tuned to exploit GPU hardware can substantially outperform non-partitioned hash joins, iii) hardware-conscious GPU joins can effectively overcome GPU limitations and match, or even outperform, state-of-the-art CPU joins.
ISSN:	2375-026X
DOI:	10.1109/ICDE.2019.00068