Optimizing cost and performance trade-offs for MapReduce job processing in the cloud
Cloud computing offers a new, attractive option to customers for provisioning a suitable size Hadoop cluster, consuming resources as a service, executing the MapReduce workload, and paying for the time these resources were used. One of the open questions in such environments is the choice and the am...
Saved in:
Published in | 2014 IEEE Network Operations and Management Symposium (NOMS) pp. 1 - 8 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Cloud computing offers a new, attractive option to customers for provisioning a suitable size Hadoop cluster, consuming resources as a service, executing the MapReduce workload, and paying for the time these resources were used. One of the open questions in such environments is the choice and the amount of resources that a user should lease from the service provider. In this work 1 , we offer a framework for evaluating and selecting the right underlying platform (e.g., small, medium, or large EC2 instances) and achieving the desirable Service Level Objectives (SLOs). A user can define a set of different SLOs: i) achieving a given completion time for a set of MapReduce jobs while minimizing the cost (budget), or ii) for a given budget select the type and the number of instances that optimize the MapReduce workload performance (i.e., the completion time of the jobs). We demonstrate that the application performance of a customer workload may vary significantly on different platforms. This makes a selection of the best cost/performance platform for a given workload being a challenging problem. Our evaluation study and experiments with Amazon EC2 platform reveal that for different workload mixes the optimized platform choice may result in 37-70% cost savings for achieving the same performance objectives when using different (but seemingly equivalent) choices. The results of our simulation study are validated through experiments with Hadoop clusters deployed on different Amazon EC2 instances. |
---|---|
ISSN: | 1542-1201 2374-9709 |
DOI: | 10.1109/NOMS.2014.6838231 |