To repair or not to repair: Assessing fault resilience in MPI stencil applications

With the increasing size of HPC computations, faults are becoming more and more relevant in the HPC field. The MPI standard does not define the application behaviour after a fault, leaving the burden of fault management to the user, who usually resorts to checkpoint and restart mechanisms. This tren...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 205; p. 105156
Main Authors Rocco, Roberto, Boella, Elisabetta, Gregori, Daniele, Palermo, Gianluca
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.11.2025
Subjects
Online AccessGet full text

Cover

Loading…