Diskless Checkpointing with Rollback-Dependency Trackability
One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, whe...
Saved in:
Published in | Proceedings - Symposium on Reliable Distributed Systems pp. 275 - 281 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2010
|
Subjects | |
Online Access | Get full text |
ISBN | 9780769542508 0769542506 |
ISSN | 1060-9857 |
DOI | 10.1109/SRDS.2010.17 |
Cover
Summary: | One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, where a failed process's state can be determined only accessing non-faulty process's memory. In the literature diskless check pointing is usually based on synchronous protocols or properties of the application. In this paper we present a quasi-synchronous diskless check pointing algorithm, called RDT-Diskless, based on Rollback-Dependency Track ability. The proposed algorithm includes a garbage collection approach that limits the number of checkpoints that must be kept in memory. A framework, called Cheops, was developed and experimental results were obtained from a commercial cloud environment. |
---|---|
ISBN: | 9780769542508 0769542506 |
ISSN: | 1060-9857 |
DOI: | 10.1109/SRDS.2010.17 |