Risk-resilient heuristics and genetic algorithms for security-assured grid job scheduling

In scheduling a large number of user jobs for parallel execution on an open-resource grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. This paper models the risk and insecure conditions in grid job sche...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computers Vol. 55; no. 6; pp. 703 - 719
Main Authors Song, S., Kai Hwang, Yu-Kwong Kwok
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2006
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In scheduling a large number of user jobs for parallel execution on an open-resource grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. This paper models the risk and insecure conditions in grid job scheduling. Three risk-resilient strategies, preemptive, replication, and delay-tolerant, are developed to provide security assurance. We propose six risk-resilient scheduling algorithms to assure secure grid job execution under different risky conditions. We report the simulated grid performances of these new grid job scheduling algorithms under the NAS and PSA workloads. The relative performance is measured by the total job makespan, grid resource utilization, job failure rate, slowdown ratio, replication overhead, etc. In addition to extending from known scheduling heuristics, we developed a new space-time genetic algorithm (STGA) based on faster searching and protected chromosome formation. Our simulation results suggest that, in a wide-area grid environment, it is more resilient for the global job scheduler to tolerate some job delays instead of resorting to preemption or replication or taking a risk on unreliable resources allocated. We find that delay-tolerant min-min and STGA job scheduling have 13-23 percent higher performance than using risky or preemptive or replicated algorithms. The resource overheads for replicated job scheduling are kept at a low 15 percent. The delayed job execution is optimized with a delay factor, which is 20 percent of the total makespan. A Kiviat graph is proposed for demonstrating the quality of grid computing services. These risk-resilient job scheduling schemes can upgrade grid performance significantly at only a moderate increase in extra resources or scheduling delays in a risky grid computing environment
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2006.89