Learning-based dynamic reliability management for dark silicon processor considering EM effects

In this article, we propose a new dynamic reliability management (DRM) technique for emerging dark silicon manycore processors. We formulate our DRM problem as minimizing the energy consumption subject to the reliability, performance and thermal constraints. The new approach is based on a newly prop...

Full description

Saved in:
Bibliographic Details
Published inProceedings - Design, Automation, and Test in Europe Conference and Exhibition pp. 463 - 468
Main Authors Taeyoung Kim, Xin Huang, Hai-Bao Chen, Sukharev, Valeriy, Tan, Sheldon X.-D
Format Conference Proceeding Journal Article
LanguageEnglish
Published EDAA 01.03.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this article, we propose a new dynamic reliability management (DRM) technique for emerging dark silicon manycore processors. We formulate our DRM problem as minimizing the energy consumption subject to the reliability, performance and thermal constraints. The new approach is based on a newly proposed physics-based electromigration (EM) reliability model to predict the EM reliability of full-chip power grid networks. We consider thermal design power (TDP) as the power constraint for a dark silicon manycore processor. We employ both dynamic voltage and frequency scaling (DVFS) and dark silicon core using ON/OFF pulsing action as the two control knobs. To solve the problem, we apply the adaptive Q-learning based method, which is suitable for runtime operation as it can provide cost-effective yet good solutions. A large class of multithreaded applications is used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results on a 64-core dark silicon chip show that the proposed DRM algorithm can effectively reduce the energy consumption of a dark silicon manycore system when the system is not tightly constrained. The proposed method can outperform a simple global DVFS method significantly in this case.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
ISSN:1558-1101