Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection

As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing Hardware-based Redundant Multi-Threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the pr...

Full description

Saved in:
Bibliographic Details
Published inCode Generation and Optimization: Proceedings of the International Symposium on Code Generation and Optimization; 11-14 Mar. 2007 pp. 244 - 258
Main Authors Wang, Cheng, Kim, Ho-seop, Wu, Youfeng, Ying, Victor
Format Conference Proceeding
LanguageEnglish
Published Washington, DC, USA IEEE Computer Society 11.03.2007
IEEE
SeriesACM Conferences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing Hardware-based Redundant Multi-Threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the program into redundant execution threads and compare their computation results. In this paper, we present a Software-based Redundant Multi-Threading (SRMT) approach for transient fault detection. Our SRMT technique uses compiler to automatically generate redundant threads so they can run on general-purpose chip multi-processors (CMPs). We exploit high-level program information available at compile time to optimize data communication between redundant threads. Furthermore, our software-based technique provides flexible program execution environment where the legacy binary codes and the reliability-enhanced codes can co-exist in a mix-and-match fashion, depending on the desired level of reliability and software compatibility. Our experimental results show that compiler analysis and optimization techniques can reduce data communication requirement by up to 88% of HRMT. With general-purpose intra-chip communication mechanisms in CMP machine, SRMT overhead can be as low as 19%. Moreover, SRMT technique achieves error coverage rates of 99.98% and 99.6% for SPEC CPU2000 integer and floating-point benchmarks, respectively. These results demonstrate the competitiveness of SRMT to HRMT approaches.
Bibliography:SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
ISBN:9780769527642
0769527647
DOI:10.1109/CGO.2007.7