Energy Analysis of Hadoop Cluster Failure Recovery

Energy efficiency is now used as an important metric for evaluating a computing system. However, saving energy is a big challenge due to many constraints. For example, in one of the most popular distributed processing frameworks, Hadoop, three replicas of each data block are randomly distributed in...

Full description

Saved in:
Bibliographic Details
Published in2013 International Conference on Parallel and Distributed Computing, Applications and Technologies pp. 141 - 146
Main Authors Weiyue Xu, Ying Lu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Energy efficiency is now used as an important metric for evaluating a computing system. However, saving energy is a big challenge due to many constraints. For example, in one of the most popular distributed processing frameworks, Hadoop, three replicas of each data block are randomly distributed in order to improve performance and fault tolerance. But such a mechanism limits the largest number of machines that can be turned off to save energy without affecting the data availability. To overcome this limitation, previous research introduces a new mechanism called covering subset which maintains a set of active nodes to ensure the immediate availability of data, even when all other nodes are turned off. This covering subset based mechanism works smoothly if no failure happens. However, a node in the covering subset may fail. In this paper, we study the energy-efficient failure recovery in Hadoop clusters. Rather than only using the replication as adopted by a Hadoop system by default, we investigate both replication and erasure coding as possible redundancy mechanisms. We develop failure recovery algorithms for both systems and analytically compare their energy efficiency.
ISSN:2379-5352
DOI:10.1109/PDCAT.2013.29