An experimental study about diskless checkpointing

Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of...

Full description

Saved in:
Bibliographic Details
Published inProceedings. 24th EUROMICRO Conference (Cat. No.98EX204) Vol. 1; pp. 395 - 402 vol.1
Main Authors Silva, L.M., Silva, J.G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 1998
Subjects
Online AccessGet full text
ISBN9780818686467
0818686464
ISSN1089-6503
DOI10.1109/EURMIC.1998.711832

Cover

Loading…
More Information
Summary:Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper presents two main memory check pointing schemes that can be used in any parallel machine without requiring any change to the hardware: one scheme saves the checkpoints in the memory of other processors, while the other is based on a parity approach. Both techniques have been implemented and evaluated in a commercial parallel machine. Some conclusions have been taken that clearly show the superiority of one of those schemes.
ISBN:9780818686467
0818686464
ISSN:1089-6503
DOI:10.1109/EURMIC.1998.711832