VCCP: A transparent, coordinated checkpointing system for virtualization-based cluster computing
Virtual machine, which typically consists of a guest operating system (OS) and its serial applications, can be checkpointed, migrated to another cluster node, and restarted later to its previous saved state. However, to date, it is nontrivial to provide checkpoint-restart mechanisms with the same le...
Saved in:
Published in | 2009 IEEE International Conference on Cluster Computing and Workshops pp. 1 - 10 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.08.2009
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Virtual machine, which typically consists of a guest operating system (OS) and its serial applications, can be checkpointed, migrated to another cluster node, and restarted later to its previous saved state. However, to date, it is nontrivial to provide checkpoint-restart mechanisms with the same level of transparency for distributed applications running on a cluster of virtual machines. To address this particular issue, we have created the Virtual Cluster CheckPointing (VCCP) system, a novel system for transparent coordinated checkpoint-restart of virtual machines and its distributed application on commodity clusters. In this paper, we detail the design and implementation of the VCCP system. Our VCCP prototype extends the open source QEMU system with kqemu module by implementing hypervisor-based Coordinated Checkpoint-Restart protocols. To verify and validate our prototype, we measured its performance using the NAS parallel benchmark. Our experimental results indicate that VCCP generates less than 1% of additional execution overhead for non-communication intensive parallel applications. Furthermore, our correctness analysis shows that VCCP does not cause message loss or reordering, which is a necessary property to ensure correctness of checkpoint-restart mechanism. Finally, we believe that VCCP is a promising checkpoint-restart alternative for legacy applications that have implemented traditional process-level checkpoint-restart. |
---|---|
ISBN: | 9781424450114 142445011X |
ISSN: | 1552-5244 2168-9253 |
DOI: | 10.1109/CLUSTR.2009.5289183 |