A fault-tolerant last level cache for CMPs operating at ultra-low voltage

Voltage scaling to values near the threshold voltage is a promising technique to hold off the many-core power wall. However, as voltage decreases, some SRAM cells are unable to operate reliably and show a behavior consistent with a hard fault. Block disabling is a micro-architectural technique that...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 125; pp. 31 - 44
Main Authors Ferrerón, Alexandra, Alastruey-Benedé, Jesús, Suárez Gracia, Darío, Monreal Arnal, Teresa, Ibáñez Marín, Pablo, Viñals Yúfera, Víctor
Format Journal Article Publication
LanguageEnglish
Published Elsevier Inc 01.03.2019
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Voltage scaling to values near the threshold voltage is a promising technique to hold off the many-core power wall. However, as voltage decreases, some SRAM cells are unable to operate reliably and show a behavior consistent with a hard fault. Block disabling is a micro-architectural technique that allows low-voltage operation by deactivating faulty cache entries, at the expense of reducing the effective cache capacity. In the case of the last-level cache, this capacity reduction leads to an increase in off-chip memory accesses, diminishing the overall energy benefit of reducing the voltage supply. In this work, we exploit the reuse locality and the intrinsic redundancy of multi-level inclusive hierarchies to enhance the performance of block disabling with negligible cost. The proposed fault-aware last-level cache management policy maps critical blocks, those not present in private caches and with a higher probability of being reused, to active cache entries. Our evaluation shows that this fault-aware management results in up to 37.3% and 54.2% fewer misses per kilo instruction (MPKI) than block disabling for multiprogrammed and parallel workloads, respectively. This translates to performance enhancements of up to 13% and 34.6% for multiprogrammed and parallel workloads, respectively. •Fault-Tolerant Last Level Cache for CMPs Operating at Ultra-Low Voltage.•Mechanism that exploits redundancy and reuse to enhance block disabling performance.•Fault-aware LLC management that maps critical blocks to operative cache entries.•Detailed evaluation of block disabling techniques in a shared-memory coherent CMP.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2018.10.010