VCCP: A transparent, coordinated checkpointing system for virtualization-based cluster computing

Virtual machine, which typically consists of a guest operating system (OS) and its serial applications, can be checkpointed, migrated to another cluster node, and restarted later to its previous saved state. However, to date, it is nontrivial to provide checkpoint-restart mechanisms with the same le...

Full description

Saved in:
Bibliographic Details
Published in2009 IEEE International Conference on Cluster Computing and Workshops pp. 1 - 10
Main Authors Ong, H., Saragol, N., Chanchio, K., Leangsuksun, C.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2009
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Virtual machine, which typically consists of a guest operating system (OS) and its serial applications, can be checkpointed, migrated to another cluster node, and restarted later to its previous saved state. However, to date, it is nontrivial to provide checkpoint-restart mechanisms with the same level of transparency for distributed applications running on a cluster of virtual machines. To address this particular issue, we have created the Virtual Cluster CheckPointing (VCCP) system, a novel system for transparent coordinated checkpoint-restart of virtual machines and its distributed application on commodity clusters. In this paper, we detail the design and implementation of the VCCP system. Our VCCP prototype extends the open source QEMU system with kqemu module by implementing hypervisor-based Coordinated Checkpoint-Restart protocols. To verify and validate our prototype, we measured its performance using the NAS parallel benchmark. Our experimental results indicate that VCCP generates less than 1% of additional execution overhead for non-communication intensive parallel applications. Furthermore, our correctness analysis shows that VCCP does not cause message loss or reordering, which is a necessary property to ensure correctness of checkpoint-restart mechanism. Finally, we believe that VCCP is a promising checkpoint-restart alternative for legacy applications that have implemented traditional process-level checkpoint-restart.
ISBN:9781424450114
142445011X
ISSN:1552-5244
2168-9253
DOI:10.1109/CLUSTR.2009.5289183