Skew-Tolerant Key Distribution for Load Balancing in MapReduce

MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tol...

Full description

Saved in:
Bibliographic Details
Published inIEICE Transactions on Information and Systems Vol. E95.D; no. 2; pp. 677 - 680
Main Authors SON, Jihoon, CHOI, Hyunsik, CHUNG, Yon Dohn
Format Journal Article
LanguageEnglish
Published Oxford The Institute of Electronics, Information and Communication Engineers 2012
Oxford University Press
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0916-8532
1745-1361
DOI:10.1587/transinf.E95.D.677