Extending the Mochi Methodology to Enable Dynamic HPC Data Services

High-performance computing (HPC) applications and workflows are increasingly making use of custom data services to complement traditional parallel file systems with fast transient data management capabilities tailored to application-specific needs. In the Mochi project we provide methodologies and t...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 414 - 422
Main Authors Dorier, Matthieu, Carns, Philip, Ross, Robert, Snyder, Shane, Latham, Rob, Gueroudji, Amal, Amvrosiadis, George, Cranor, Chuck, Soumagne, Jerome
Format Conference Proceeding
LanguageEnglish
Published IEEE 27.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High-performance computing (HPC) applications and workflows are increasingly making use of custom data services to complement traditional parallel file systems with fast transient data management capabilities tailored to application-specific needs. In the Mochi project we provide methodologies and tools that enable rapid development of custom HPC data services, including a collection of composable software components that can be combined to build complex distributed data services. Our initial version of Mochi targeted data services deployed with static configurations with a fixed number of nodes and minimal fault tolerance. However, there is a growing need for dynamic services that can adapt while running in response to changing workloads and system conditions. In this paper we present our work to extend the Mochi architecture to support the development of dynamic data services. We achieve this by providing new Mochi components that support unified bootstrapping and online reconfiguration, fault detection, monitoring, and consensus. We also provide a methodology for deriving service-wide resilience from the resilience of each of the service's components.
DOI:10.1109/IPDPSW63119.2024.00091