Using Imbalance Characteristic for Fault-Tolerant Workflow Scheduling in Cloud Systems

Resubmission and replication are two fundamental and widely recognized techniques in distributed computing systems for fault tolerance. The resubmission based strategy has an advantage in resource utilization, while the replication based strategy can reduce the task completed time in the context of...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 28; no. 12; pp. 3671 - 3683
Main Authors Yao, Guangshun, Ding, Yongsheng, Hao, Kuangrong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Resubmission and replication are two fundamental and widely recognized techniques in distributed computing systems for fault tolerance. The resubmission based strategy has an advantage in resource utilization, while the replication based strategy can reduce the task completed time in the context of fault. However, few researches take these two techniques together for fault-tolerant workflow scheduling, especially in Cloud systems. In this paper, we present a novel fault-tolerant workflow scheduling (ICFWS) algorithm for Cloud systems by combining the aforementioned two strategies together to play their respective advantages for fault tolerance while trying to meet the soft deadline of workflow. First, it divides the soft deadline of workflow into multiple sub-deadlines for all tasks. Then, it selects a reasonable fault-tolerant strategy and reserves suitable resource for each task by taking the imbalance sub-deadlines among tasks and on-demand resource provisioning of Cloud systems into consideration. Finally, an online scheduling and reservation adjustment scheme is designed to select a suitable resource for the task with resubmission strategy and adjust the sub-deadlines as well as fault-tolerant strategies of some unexecuted tasks during the task execution process, respectively. The proposed algorithm is evaluated on both real-world and randomly generated workflows. The results demonstrate that the ICFWS outperforms some well-known approaches on corresponding metrics.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2017.2687923