Workload Characterization and Green Scheduling on Heterogeneous Clusters

Conference Title: 2016 22nd Annual International Conference on Advanced Computing and Communication (ADCOM) Conference Start Date: 2016, Sept. 8 Conference End Date: 2016, Sept. 10 Conference Location: Bangalore, India With the emergence of large, heterogeneous, shared computing clusters, their effi...

Full description

Saved in:
Bibliographic Details
Published inThe Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings p. 3
Main Authors Jivrajani, Aarti, Raghu, Dhanya, KH, Apoorva, Phalachandra, H L, Sitaram, Dinkar
Format Conference Proceeding
LanguageEnglish
Published Piscataway The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 01.01.2016
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:Conference Title: 2016 22nd Annual International Conference on Advanced Computing and Communication (ADCOM) Conference Start Date: 2016, Sept. 8 Conference End Date: 2016, Sept. 10 Conference Location: Bangalore, India With the emergence of large, heterogeneous, shared computing clusters, their efficient use by mixed distributed workloads and tenants remains an important challenge. This paper focuses on scheduling of batches of similar jobs on the cluster of machines with similar characteristics to fully utilize the resources at the node and reduce the energy consumption of the datacenter. We use a novel algorithm to find the best fit machine in that cluster. We perform hierarchical clustering to identify common groups of jobs and common groups of machines, which is a more efficient way of clustering than the conventional k-means clustering. To estimate the energy saving, we analyze a recent Google release of scheduler request and utilization data across a large (12500+) general- purpose compute cluster over 29 days. We offer a statistical profile of the data, with several interesting discoveries regarding batching similar tasks in a heterogeneous dataset, CPU and memory consumptions, task durations, and others.