A survey on modeling and improving reliability of DNN algorithms and accelerators

As DNNs become increasingly common in mission-critical applications, ensuring their reliable operation has become crucial. Conventional resilience techniques fail to account for the unique characteristics of DNN algorithms/accelerators, and hence, they are infeasible or ineffective. In this paper, w...

Full description

Saved in:
Bibliographic Details
Published inJournal of systems architecture Vol. 104; p. 101689
Main Author Mittal, Sparsh
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As DNNs become increasingly common in mission-critical applications, ensuring their reliable operation has become crucial. Conventional resilience techniques fail to account for the unique characteristics of DNN algorithms/accelerators, and hence, they are infeasible or ineffective. In this paper, we present a survey of techniques for studying and optimizing the reliability of DNN accelerators and architectures. The reliability issues we cover include soft/hard errors arising due to process variation, voltage scaling, timing errors, DRAM errors due to refresh rate scaling and thermal effects, etc. We organize the research projects on several categories to bring out their key attributes. This paper underscores the importance of designing for reliability as the first principle, and not merely retrofit for it.
ISSN:1383-7621
DOI:10.1016/j.sysarc.2019.101689