Hardware-Conscious Hash-Joins on GPUs

Traditionally, analytical database engines have used task parallelism provided by modern multisocket multicore CPUs for scaling query execution. Over the past few years, GPUs have started gaining traction as accelerators for processing analytical queries due to their massively data-parallel nature a...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE 35th International Conference on Data Engineering (ICDE) pp. 698 - 709
Main Authors Sioulas, Panagiotis, Chrysogelos, Periklis, Karpathiotakis, Manos, Appuswamy, Raja, Ailamaki, Anastasia
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Traditionally, analytical database engines have used task parallelism provided by modern multisocket multicore CPUs for scaling query execution. Over the past few years, GPUs have started gaining traction as accelerators for processing analytical queries due to their massively data-parallel nature and high memory bandwidth. Recent work on designing join algorithms for CPUs has shown that carefully tuned join implementations that exploit underlying hardware can outperform naive, hardware-oblivious counterparts and provide excellent performance on modern multicore servers. However, there has been no such systematic analysis of hardware-conscious join algorithms for GPUs that systematically explores the dimensions of partitioning (partitioned versus non-partitioned joins), data location (data fitting and not fitting in GPU device memory), and access pattern (skewed versus uniform). In this paper, we present the design and implementation of a family of novel, partitioning-based GPU-join algorithms that are tuned to exploit various GPU hardware characteristics for working around the two main limitations of GPUs-limited memory capacity and slow PCIe interface. Using a thorough evaluation, we show that: i) hardware-consciousness plays a key role in GPU joins similar to CPU joins and our join algorithms can process 1 Billion tuples/second even if no data is GPU resident, ii) radix partitioning-based GPU joins that are tuned to exploit GPU hardware can substantially outperform non-partitioned hash joins, iii) hardware-conscious GPU joins can effectively overcome GPU limitations and match, or even outperform, state-of-the-art CPU joins.
ISSN:2375-026X
DOI:10.1109/ICDE.2019.00068