An effective algorithm for parallelizing sort merge joins in the presence of data skew
Parallel processing of relational queries has received considerable attention of late. However, in the presence of data skew, the speedup from conventional parallel join algorithms can be very limited, due to load imbalances among the various processors. Even a single large skew element can cause a...
Saved in:
Published in | Databases in Parallel and Distributed Systems: 2nd International Symposium pp. 103 - 115 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
New York, NY, USA
ACM
01.07.1990
|
Series | ACM Conferences |
Subjects | |
Online Access | Get full text |
ISBN | 9780818620522 0818620528 |
DOI | 10.1145/319057.319072 |
Cover
Summary: | Parallel processing of relational queries has received considerable attention of late. However, in the presence of data skew, the speedup from conventional parallel join algorithms can be very limited, due to load imbalances among the various processors. Even a single large skew element can cause a processor to become overloaded. In this paper, we propose a parallel sort merge join algorithm which uses a divide-and-conquer approach to address the data skew problem. The proposed algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution for data skew, the algorithm is demonstrated to achieve very good load balancing for the join phase in a CPU-bound environment, and is shown to be very robust relative to the degree of data skew and the total number of processors. |
---|---|
Bibliography: | SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25 |
ISBN: | 9780818620522 0818620528 |
DOI: | 10.1145/319057.319072 |