An effective algorithm for parallelizing sort merge joins in the presence of data skew

A parallel sort-merge-join algorithm that uses a divide-and-conquer approach to address the data skew problem is proposed. The algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output...

Full description

Saved in:

Bibliographic Details
Published in	Databases in Parallel and Distributed Systems: 2nd International Symposium pp. 103 - 115
Main Authors	Wolf, J.L., Dias, D.M., Yu, P.S.
Format	Conference Proceeding
Language	English
Published	IEEE Comput. Soc. Press 1990
Subjects	Delay Load management Parallel architectures Parallel processing Processor scheduling Proposals Prototypes Relational databases Robustness Scheduling algorithm
Online Access	Get full text
ISBN	9780818620522 0818620528
DOI	10.1109/DPDS.1990.113702

Cover

More Information
Summary:	A parallel sort-merge-join algorithm that uses a divide-and-conquer approach to address the data skew problem is proposed. The algorithm adds an extra scheduling phase to the usual sort, transfer and join phases. During the scheduling phase, a parallelizable optimization algorithm, using the output of the sort phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the largest skew elements and assigns each of them to an optimal number of processors. Assuming a Zipf-like distribution for data skew, the algorithm is shown to achieve very good load balancing for the join phase in a CPU-bound environment and to be very robust relative to the degree of data skew and the total number of processors.< >
ISBN:	9780818620522 0818620528
DOI:	10.1109/DPDS.1990.113702