An experimental study about diskless checkpointing
Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of...
Saved in:
Published in | Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204) Vol. 1; pp. 395 - 402 vol.1 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
1998
|
Subjects | |
Online Access | Get full text |
ISBN | 9780818686467 0818686464 |
ISSN | 1089-6503 |
DOI | 10.1109/EURMIC.1998.711832 |
Cover
Loading…
Summary: | Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, the checkpoint data is saved in some diskfiles. However, in some situations the disk operation may result in a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper presents two main memory check pointing schemes that can be used in any parallel machine without requiring any change to the hardware: one scheme saves the checkpoints in the memory of other processors, while the other is based on a parity approach. Both techniques have been implemented and evaluated in a commercial parallel machine. Some conclusions have been taken that clearly show the superiority of one of those schemes. |
---|---|
ISBN: | 9780818686467 0818686464 |
ISSN: | 1089-6503 |
DOI: | 10.1109/EURMIC.1998.711832 |