Handling Stuck-at-Fault Defects Using Matrix Transformation for Robust Inference of DNNs

Matrix-vector multiplication is the dominating computational workload in the inference phase of deep neural networks (DNNs). Memristor crossbar arrays (MCAs) can efficiently perform matrix-vector multiplication in the analog domain. A key challenge is that memristor devices may suffer stuck-at-fault...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 10; pp. 2448 - 2460
Main Authors Zhang, Baogang, Uysal, Necati, Fan, Deliang, Ewetz, Rickard
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Matrix-vector multiplication is the dominating computational workload in the inference phase of deep neural networks (DNNs). Memristor crossbar arrays (MCAs) can efficiently perform matrix-vector multiplication in the analog domain. A key challenge is that memristor devices may suffer stuck-at-fault defects, which can severely degrade the classification accuracy. Earlier studies have shown that the accuracy loss can be recovered by utilizing additional hardware or hardware aware training. In this article, we propose a framework that handles stuck-at-faults using matrix transformations, which is called the MT framework. The framework is based on introducing a cost metric that captures the negative impact of the stuck-at-fault defects. Next, the cost metric is minimized by applying matrix transformations <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>. A transformation <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula> changes a weight matrix <inline-formula> <tex-math notation="LaTeX">W </tex-math></inline-formula> into a new weight matrix <inline-formula> <tex-math notation="LaTeX">\widetilde {W}= T(W) </tex-math></inline-formula>. In particular, a row flipping transformation, a permutation transformation, and a value range transformation are proposed. The row flipping transformation results in that stuck-off (stuck-on) faults are translated into stuck-on (stuck-off) faults. The permutation transformation maps small (large) weights to memristors stuck-off (stuck-on). The value range transformation is based on reducing the magnitude of the smallest and largest elements in the weight matrices, which results in that the stuck-at-faults introduce smaller errors. The experimental results demonstrate that the MT framework is capable of recovering 99% of the accuracy loss on both the MNIST and CIFAR-10 datasets without utilizing hardware aware training. The accuracy improvements come at the expense of an <inline-formula> <tex-math notation="LaTeX">8.19\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">9.23\times </tex-math></inline-formula> overhead in power and area, respectively. Nevertheless, the overhead can be reduced with up to 50% by leveraging hardware aware training.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2019.2944582