Skew-Tolerant Key Distribution for Load Balancing in MapReduce
MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tol...
Saved in:
Published in | IEICE Transactions on Information and Systems Vol. E95.D; no. 2; pp. 677 - 680 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Oxford
The Institute of Electronics, Information and Communication Engineers
2012
Oxford University Press |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 0916-8532 1745-1361 |
DOI: | 10.1587/transinf.E95.D.677 |