An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems

The Grid scheduler, schedules user jobs on the best available resource in terms of resource characteristics by optimizing job execution time. Resource failure in Grid is no longer an exception but a regular occurring event as resources are increasingly being used by the scientific community to solve...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 12; no. 5; p. e0177567
Main Authors Idris, Hajara, Ezugwu, Absalom E, Junaidu, Sahalu B, Adewumi, Aderemi O
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 17.05.2017
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The Grid scheduler, schedules user jobs on the best available resource in terms of resource characteristics by optimizing job execution time. Resource failure in Grid is no longer an exception but a regular occurring event as resources are increasingly being used by the scientific community to solve computationally intensive problems which typically run for days or even months. It is therefore absolutely essential that these long-running applications are able to tolerate failures and avoid re-computations from scratch after resource failure has occurred, to satisfy the user's Quality of Service (QoS) requirement. Job Scheduling with Fault Tolerance in Grid Computing using Ant Colony Optimization is proposed to ensure that jobs are executed successfully even when resource failure has occurred. The technique employed in this paper, is the use of resource failure rate, as well as checkpoint-based roll back recovery strategy. Check-pointing aims at reducing the amount of work that is lost upon failure of the system by immediately saving the state of the system. A comparison of the proposed approach with an existing Ant Colony Optimization (ACO) algorithm is discussed. The experimental results of the implemented Fault Tolerance scheduling algorithm show that there is an improvement in the user's QoS requirement over the existing ACO algorithm, which has no fault tolerance integrated in it. The performance evaluation of the two algorithms was measured in terms of the three main scheduling performance metrics: makespan, throughput and average turnaround time.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Conceptualization: HI AEE.Data curation: HI AEE AOA SBJ.Formal analysis: HI AEE AOA SBJ.Investigation: HI AEE AOA SBJ.Methodology: HI SBJ.Project administration: AOA SBJ.Resources: HI AEE AOA SBJ.Software: HI AEE AOE.Supervision: AOA SBJ.Validation: HI AEE AOA SBJ.Visualization: HI AEE.Writing – original draft: HI AEE AOA SBJ.Writing – review & editing: HI AEE AOA SBJ.
Competing Interests: The authors have declared that no competing interests exist.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0177567