Towards Zero-Waste Recovery and Zero-Overhead Checkpointing in Ensemble Data Assimilation
Ensemble data assimilation is a powerful tool for increasing the accuracy of climatological states. It is based on combining observations with the results from numerical model simulations. The method comprises two steps, (1) the propagation, where the ensemble states are advanced by the numerical mo...
Saved in:
Published in | 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC) pp. 131 - 140 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Ensemble data assimilation is a powerful tool for increasing the accuracy of climatological states. It is based on combining observations with the results from numerical model simulations. The method comprises two steps, (1) the propagation, where the ensemble states are advanced by the numerical model and (2) the analysis, where the model states are corrected with observations. One bottleneck in ensemble data assimilation is circulating the ensemble states between the two steps. Often, the states are circulated using files. This article presents an extended implementation of Melissa-DA, an in-memory ensemble data assimilation framework, allowing zero-overhead checkpointing and recovery with few or zero recomputation. We hide the checkpoint creation using dedicated threads and MPI processes. We benchmark our implementation with up to 512 members simulating the Lorenz96 model using 10 9 gridpoints. We utilize up to 8 K processes and 8 TB of checkpoint data per cycle and reach a peak performance of 52 teraFLOPS. |
---|---|
ISSN: | 2640-0316 |
DOI: | 10.1109/HiPC53243.2021.00027 |