Diskless Checkpointing with Rollback-Dependency Trackability

One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, whe...

Full description

Saved in:
Bibliographic Details
Published inProceedings - Symposium on Reliable Distributed Systems pp. 275 - 281
Main Authors Menderico, R M, Garcia, I C
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2010
Subjects
Online AccessGet full text
ISBN9780769542508
0769542506
ISSN1060-9857
DOI10.1109/SRDS.2010.17

Cover

More Information
Summary:One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, where a failed process's state can be determined only accessing non-faulty process's memory. In the literature diskless check pointing is usually based on synchronous protocols or properties of the application. In this paper we present a quasi-synchronous diskless check pointing algorithm, called RDT-Diskless, based on Rollback-Dependency Track ability. The proposed algorithm includes a garbage collection approach that limits the number of checkpoints that must be kept in memory. A framework, called Cheops, was developed and experimental results were obtained from a commercial cloud environment.
ISBN:9780769542508
0769542506
ISSN:1060-9857
DOI:10.1109/SRDS.2010.17