PREACHES-portable recovery and checkpointing in heterogeneous systems

Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homo...

Full description

Saved in:
Bibliographic Details
Published inDigest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224) pp. 38 - 47
Main Authors Kuo-Feng Ssu, Fuchs, W.K.
Format Conference Proceeding
LanguageEnglish
Published IEEE 1998
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.
ISBN:9780818684708
0818684704
ISSN:0731-3071
2375-124X
DOI:10.1109/FTCS.1998.689453