Diskless Checkpointing with Rollback-Dependency Trackability

One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, whe...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - Symposium on Reliable Distributed Systems pp. 275 - 281
Main Authors	Menderico, R M, Garcia, I C
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2010
Subjects	availability Checkpointing Clouds dependability distributed algorithms Fault tolerance Fault tolerant systems Protocols Servers Synchronization
Online Access	Get full text
ISBN	9780769542508 0769542506
ISSN	1060-9857
DOI	10.1109/SRDS.2010.17

Cover

More Information
Summary:	One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, where a failed process's state can be determined only accessing non-faulty process's memory. In the literature diskless check pointing is usually based on synchronous protocols or properties of the application. In this paper we present a quasi-synchronous diskless check pointing algorithm, called RDT-Diskless, based on Rollback-Dependency Track ability. The proposed algorithm includes a garbage collection approach that limits the number of checkpoints that must be kept in memory. A framework, called Cheops, was developed and experimental results were obtained from a commercial cloud environment.
ISBN:	9780769542508 0769542506
ISSN:	1060-9857
DOI:	10.1109/SRDS.2010.17