Power Efficient MapReduce Workload Acceleration Using Integrated-GPU

With the pervasiveness of MapReduce - one of the most prominent programming models for data parallelism in Apache Hadoop-, many researchers and developers have spent tremendous effort attempting to boost the computational speed and energy efficiency of MapReduce-based big data processing. However, t...

Full description

Saved in:

Bibliographic Details
Published in	2015 IEEE First International Conference on Big Data Computing Service and Applications pp. 162 - 169
Main Authors	SungYe Kim, Bottleson, Jeremy, Jingyi Jin, Bindu, Preeti, Sakhare, Snehal C., Spisak, Joseph S.
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2015
Subjects	Acceleration Big Data GPGPU Graphics processing units Hadoop Integrated Graphics Java Kernel Machine Learning Mahout OpenCL Optimization Performance gain Power demand
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the pervasiveness of MapReduce - one of the most prominent programming models for data parallelism in Apache Hadoop-, many researchers and developers have spent tremendous effort attempting to boost the computational speed and energy efficiency of MapReduce-based big data processing. However, the scalable and fault-tolerant nature of MapReduce introduces additional costs in disk IO and data transfer, caused by streaming intermediate outputs to disk. In light of these issues, many interesting research projects have been initiated with the goal of improving the compute speed and power efficiency of compute-intensive cloud computing workloads, several with the addition of discrete GPUs. In this work, we present a modified MapReduce approach focused on the iterative clustering algorithms in the Apache Mahout machine learning library that leverage the acceleration potential of the Intel integrated GPU in a multi-node cluster environment. The accelerated framework shows varying levels of speed-up (≈45x for Map tasks-only, ≈4.37x for the entire K-means clustering) as evaluated using the HiBench benchmark suite. Based on various experiments and in-depth analysis, we find that utilizing the integrated GPU via OpenCL offers significant performance and power efficiency gains over the original CPU based approach. Further analysis is also done to understand the correlations between compute, IO and power efficiency. As such, our results show that embracing the integrated GPU in the Hadoop MapReduce framework represents a promising advance in adding cost and energy efficient compute parallelism to a data parallel multinode environment.
DOI:	10.1109/BigDataService.2015.12