A dynamic block device reconfiguration algorithm in virtual MapReduce cluster

With the advances of cloud computing and virtualization technologies, running MapReduce applications over clouds has been attracting more and more attention in recent years. However, as a fundamental problem, the performance of MapReduce applications can sometimes be severely degraded due to the ove...

Full description

Saved in:

Bibliographic Details
Published in	Cluster computing Vol. 17; no. 4; pp. 1171 - 1183
Main Authors	Lee, Kwonyong, Nam, Yoonsung, Kim, Taekhee, Park, Sungyong, Lee, Hyuk-Jun, Yang, Jihoon
Format	Journal Article
Language	English
Published	Boston Springer US 01.12.2014 Springer Nature B.V
Subjects	Algorithms Cloud computing Computer Communication Networks Computer Science Data transfer (computers) Multiple regression analysis Operating Systems Processor Architectures Reconfiguration Virtual environments Xen Block device reconfiguration Virtual cluster Cloud MapReduce
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the advances of cloud computing and virtualization technologies, running MapReduce applications over clouds has been attracting more and more attention in recent years. However, as a fundamental problem, the performance of MapReduce applications can sometimes be severely degraded due to the overheads from I/O virtualization and resource competitions among virtual machines. In this paper, we propose a dynamic block device reconfiguration algorithm in virtual MapReduce clusters, which reduces the data transfer time between virtual machines and thereby improving the performance of MapReduce applications on top of the clouds. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines at runtime. This scheme allows us to move files easily across different virtual machines without any network transfers between virtual machines. This algorithm is also dynamic in a sense that it estimates the total data transfer times between virtual machines using multiple regression analysis based on CPU utilization and data size, and adaptively determines a least-cost data transfer path between a mapper virtual machine and a reducer virtual machine. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results showed that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened up to 14 %.
ISSN:	1386-7857 1573-7543
DOI:	10.1007/s10586-014-0375-y