Pragma-Controlled Source-to-Source Code Transformations for Robust Application Execution

The most widely used resiliency approach today, based on Checkpoint and Restart (C/R) recovery, is not expected to remain viable in the presence of the accelerated fault and error rates in future Exascale-class systems. In this paper, we introduce a series of pragma directives and the corresponding...

Full description

Saved in:
Bibliographic Details
Published inEuro-Par 2016: Parallel Processing Workshops Vol. 10104; pp. 660 - 670
Main Authors Diniz, Pedro C., Liao, Chunhua, Quinlan, Daniel J., Lucas, Robert F.
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2017
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The most widely used resiliency approach today, based on Checkpoint and Restart (C/R) recovery, is not expected to remain viable in the presence of the accelerated fault and error rates in future Exascale-class systems. In this paper, we introduce a series of pragma directives and the corresponding source-to-source transformations that are designed to convey to a compiler, and ultimately a fault-aware run-time system, key information about the tolerance to memory errors in selected sections of an application. These directives, implemented in the ROSE compiler infrastructure, convey information about storage mapping and error tolerance but also amelioration and recovery using externally provided functions and multi-threading. We present preliminary results of the use of a subset of these directives for a simple implementation of the conjugate-gradient numerical solver in the presence of uncorrected memory errors, showing that it is possible to implement simple recovery strategies with very low programmer effort and execution time overhead.
ISBN:9783319589428
3319589423
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-58943-5_53