Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy
Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon a...
Saved in:
Published in | IEEE transactions on very large scale integration (VLSI) systems Vol. 25; no. 10; pp. 2803 - 2816 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.10.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the "inherent core redundancy" of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods. |
---|---|
ISSN: | 1063-8210 1557-9999 |
DOI: | 10.1109/TVLSI.2017.2715803 |