Adaptive checkpointing in dynamic grids for uncertain job durations

Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this pape...

Full description

Saved in:
Bibliographic Details
Published inInformation technology interfaces pp. 585 - 590
Main Authors Chtepen, M., Dhoedt, B., De Turck, F., Demeester, P., Claeys, F.H.A., Vanrolleghem, P.A.
Format Conference Proceeding
LanguageEnglish
Published Zagreb IEEE 01.06.2009
University Computing Centre
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this paper an adaptive algorithm, named MeanFailureCP+, is introduced that deals with checkpointing of grid applications with execution times that are unknown a priori. The algorithm modifies its parameters, based on dynamically collected feedback on its performance. Simulation results show that the new algorithm performs even better than adaptive approaches that make use of exact information on job execution times.
ISBN:9789537138158
9537138151
ISSN:1330-1012
DOI:10.1109/ITI.2009.5196152