To repair or not to repair: Assessing fault resilience in MPI stencil applications
With the increasing size of HPC computations, faults are becoming more and more relevant in the HPC field. The MPI standard does not define the application behaviour after a fault, leaving the burden of fault management to the user, who usually resorts to checkpoint and restart mechanisms. This tren...
Saved in:
Published in | Journal of parallel and distributed computing Vol. 205; p. 105156 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.11.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!