Analog-memory-based 14nm Hardware Accelerator for Dense Deep Neural Networks including Transformers

Analog non-volatile memory (NVM)-based accelerators for deep neural networks perform high-throughput and energy-efficient multiply-accumulate (MAC) operations (e.g., high TeraOPS/W) by taking advantage of massively parallelized analog MAC operations, implemented with Ohm's law and Kirchhoff...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE International Symposium on Circuits and Systems (ISCAS) pp. 3319 - 3323
Main Authors Okazaki, Atsuya, Narayanan, Pritish, Ambrogio, Stefano, Hosokawa, Kohji, Tsai, Hsinyu, Nomura, Akiyo, Yasuda, Takeo, Mackin, Charles, Friz, Alexander, Ishii, Masatoshi, Kohda, Yasuteru, Spoon, Katie, Chen, An, Fasoli, Andrea, Rasch, Malte J., Burr, Geoffrey W.
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Analog non-volatile memory (NVM)-based accelerators for deep neural networks perform high-throughput and energy-efficient multiply-accumulate (MAC) operations (e.g., high TeraOPS/W) by taking advantage of massively parallelized analog MAC operations, implemented with Ohm's law and Kirchhoff's current law on array-matrices of resistive devices. While the wide-integer and floating-point operations offered by conventional digital CMOS computing are much more suitable than analog computing for conventional applications that require high accuracy and true reproducibility, deep neural networks can still provide competitive end-to-end results even with modest (e.g., 4-bit) precision in synaptic operations. In this paper, we describe a 14-nm inference chip, comprising multiple 512\times 512 arrays of Phase Change Memory (PCM) devices, which can deliver software-equivalent inference accuracy for MNIST handwritten-digit recognition and recurrent LSTM benchmarks, by using compensation techniques to finesse analog-memory challenges such as conductance drift and noise. We also project accuracy for Natural Language Processing (NLP) tasks performed with a state-of-art large Transformer-based model, BERT, when mapped onto an extended version of this same fundamental chip architecture.
ISSN:2158-1525
DOI:10.1109/ISCAS48785.2022.9937292