A parallel sort merge join algorithm for managing data skew

A parallel sort-merge-join algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and join phases. During the scheduling phase, a parallelizable optimization algorit...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 4; no. 1; pp. 70 - 86
Main Authors	Wolf, J.L., Dias, D.M., Yu, P.S.
Format	Journal Article
Language	English
Published	IEEE 01.01.1993
Subjects	Costs Delay Load management Parallel architectures Parallel processing Processor scheduling Proposals Relational databases Robustness Scheduling algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A parallel sort-merge-join algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements, and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution of data skew, the algorithm is demonstrated to achieve very good load balancing for the join phase, and is shown to be very robust relative, among other things, to the degree of data skew and the total number of processors.< >
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1045-9219 1558-2183
DOI:	10.1109/71.205654