Extending the Mochi Methodology to Enable Dynamic HPC Data Services
High-performance computing (HPC) applications and workflows are increasingly making use of custom data services to complement traditional parallel file systems with fast transient data management capabilities tailored to application-specific needs. In the Mochi project we provide methodologies and t...
Saved in:
Published in | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 414 - 422 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
27.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | High-performance computing (HPC) applications and workflows are increasingly making use of custom data services to complement traditional parallel file systems with fast transient data management capabilities tailored to application-specific needs. In the Mochi project we provide methodologies and tools that enable rapid development of custom HPC data services, including a collection of composable software components that can be combined to build complex distributed data services. Our initial version of Mochi targeted data services deployed with static configurations with a fixed number of nodes and minimal fault tolerance. However, there is a growing need for dynamic services that can adapt while running in response to changing workloads and system conditions. In this paper we present our work to extend the Mochi architecture to support the development of dynamic data services. We achieve this by providing new Mochi components that support unified bootstrapping and online reconfiguration, fault detection, monitoring, and consensus. We also provide a methodology for deriving service-wide resilience from the resilience of each of the service's components. |
---|---|
DOI: | 10.1109/IPDPSW63119.2024.00091 |