Towards Zero-Waste Recovery and Zero-Overhead Checkpointing in Ensemble Data Assimilation

Ensemble data assimilation is a powerful tool for increasing the accuracy of climatological states. It is based on combining observations with the results from numerical model simulations. The method comprises two steps, (1) the propagation, where the ensemble states are advanced by the numerical mo...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC) pp. 131 - 140
Main Authors Keller, Kai, Kestelman, Adrian Cristal, Bautista-Gomez, Leonardo
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Ensemble data assimilation is a powerful tool for increasing the accuracy of climatological states. It is based on combining observations with the results from numerical model simulations. The method comprises two steps, (1) the propagation, where the ensemble states are advanced by the numerical model and (2) the analysis, where the model states are corrected with observations. One bottleneck in ensemble data assimilation is circulating the ensemble states between the two steps. Often, the states are circulated using files. This article presents an extended implementation of Melissa-DA, an in-memory ensemble data assimilation framework, allowing zero-overhead checkpointing and recovery with few or zero recomputation. We hide the checkpoint creation using dedicated threads and MPI processes. We benchmark our implementation with up to 512 members simulating the Lorenz96 model using 10 9 gridpoints. We utilize up to 8 K processes and 8 TB of checkpoint data per cycle and reach a peak performance of 52 teraFLOPS.
ISSN:2640-0316
DOI:10.1109/HiPC53243.2021.00027