A fault-tolerant last level cache for CMPs operating at ultra-low voltage

Voltage scaling to values near the threshold voltage is a promising technique to hold off the many-core power wall. However, as voltage decreases, some SRAM cells are unable to operate reliably and show a behavior consistent with a hard fault. Block disabling is a micro-architectural technique that...

Full description

Saved in:

Bibliographic Details
Published in	Journal of parallel and distributed computing Vol. 125; pp. 31 - 44
Main Authors	Ferrerón, Alexandra, Alastruey-Benedé, Jesús, Suárez Gracia, Darío, Monreal Arnal, Teresa, Ibáñez Marín, Pablo, Viñals Yúfera, Víctor
Format	Journal Article Publication
Language	English
Published	Elsevier Inc 01.03.2019 Elsevier
Subjects	Arquitectura de computadors Cache management Fault-tolerance Fault-tolerant computing Informàtica Multiprogramació (Ordinadors electrònics) Multiprogramming (Electronic computers) Near-threshold voltage On-chip caches SRAM reliability Tolerància als errors (Informàtica) Àrees temàtiques de la UPC SRAM reliability Near-threshold voltage Cache management Fault-tolerance On-chip caches
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Voltage scaling to values near the threshold voltage is a promising technique to hold off the many-core power wall. However, as voltage decreases, some SRAM cells are unable to operate reliably and show a behavior consistent with a hard fault. Block disabling is a micro-architectural technique that allows low-voltage operation by deactivating faulty cache entries, at the expense of reducing the effective cache capacity. In the case of the last-level cache, this capacity reduction leads to an increase in off-chip memory accesses, diminishing the overall energy benefit of reducing the voltage supply. In this work, we exploit the reuse locality and the intrinsic redundancy of multi-level inclusive hierarchies to enhance the performance of block disabling with negligible cost. The proposed fault-aware last-level cache management policy maps critical blocks, those not present in private caches and with a higher probability of being reused, to active cache entries. Our evaluation shows that this fault-aware management results in up to 37.3% and 54.2% fewer misses per kilo instruction (MPKI) than block disabling for multiprogrammed and parallel workloads, respectively. This translates to performance enhancements of up to 13% and 34.6% for multiprogrammed and parallel workloads, respectively. •Fault-Tolerant Last Level Cache for CMPs Operating at Ultra-Low Voltage.•Mechanism that exploits redundancy and reuse to enhance block disabling performance.•Fault-aware LLC management that maps critical blocks to operative cache entries.•Detailed evaluation of block disabling techniques in a shared-memory coherent CMP.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2018.10.010