New scheduling approach using reinforcement learning for heterogeneous distributed systems

Computer clusters, cloud computing and the exploitation of parallel architectures and algorithms have become the norm when dealing with scientific applications that work with large quantities of data and perform complex and time-consuming calculations. With the rise of social media applications and...

Full description

Saved in:

Bibliographic Details
Published in	Journal of parallel and distributed computing Vol. 117; pp. 292 - 302
Main Authors	Orhean, Alexandru Iulian, Pop, Florin, Raicu, Ioan
Format	Journal Article
Language	English
Published	Elsevier Inc 01.07.2018
Subjects	Distributed systems Machine learning SARSA Scheduling SARSA Scheduling Distributed systems Machine learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computer clusters, cloud computing and the exploitation of parallel architectures and algorithms have become the norm when dealing with scientific applications that work with large quantities of data and perform complex and time-consuming calculations. With the rise of social media applications and smart devices, the amount of digital data and the velocity at which it is produced have increased exponentially, determining the development of distributed system frameworks and platforms that increase productivity, consistency, fault-tolerance and security of parallel applications. The performance of such systems is mainly influenced by the architectural disposition and composition of the physical machines, the resource allocation and the scheduling of jobs and tasks. This paper proposes a reinforcement learning algorithm to solve the scheduling problem in distributed systems. The machine learning technique takes into consideration the heterogeneity of the nodes and their disposition within the grid, and the arrangement of tasks in a directed acyclic graph of dependencies, ultimately determining a scheduling policy for a better execution time. This paper also proposes a platform, in which the algorithm is implemented, that offers scheduling as a service to distributed systems. •Reinforcement learning algorithm for scheduling problem.•Integrate machine learning methods in systems that use task schedulers.•Q-learning and state–action–reward–state–action methods.•DAG scheduling on dynamic clusters.•Variable tasks and task classification.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2017.05.001