Deep In-Memory Architectures in SRAM: An Analog Approach to Approximate Computing

This article provides an overview of recently proposed deep in-memory architectures (DIMAs) in SRAM for energy- and latency-efficient hardware realization of machine learning (ML) algorithms. DIMA tackles the data movement problem in von Neumann architectures head-on by deeply embedding mixed-signal...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the IEEE Vol. 108; no. 12; pp. 2251 - 2275
Main Authors	Kang, Mingu, Gonugondla, Sujan K., Shanbhag, Naresh R.
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerator Accuracy Algorithms Analog circuits Approximate computing Approximation algorithms Artificial intelligence Circuits CMOS Computation Computer architecture Design parameters Energy efficiency in-memory computing Machine learning machine learning (ML) Memory non-von Neumann Robustness (mathematics) Signal to noise ratio Static random access memory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This article provides an overview of recently proposed deep in-memory architectures (DIMAs) in SRAM for energy- and latency-efficient hardware realization of machine learning (ML) algorithms. DIMA tackles the data movement problem in von Neumann architectures head-on by deeply embedding mixed-signal computations into a conventional memory array. In doing so, it trades off its computational signal-to-noise ratio (compute SNR) with energy and latency, and therefore, it represents an analog form of approximate computing. DIMA exploits the inherent error immunity of ML algorithms and SNR budgeting methods to operate its analog circuitry in a low-swing/low-compute SNR regime, thereby achieving <inline-formula> <tex-math notation="LaTeX">> 100\times </tex-math></inline-formula> reduction in the energy-delay product (EDP) over an equivalent von Neumann architecture with no loss in inference accuracy. This article describes DIMA's computational pipeline and provides a Shannon-inspired rationale for its robustness to process, temperature, and voltage variations and design guidelines to manage its analog nonidealities. DIMA's versatility, effectiveness, and practicality demonstrated via multiple silicon IC prototypes in a 65-nm CMOS process are described. A DIMA-based instruction set architecture (ISA) to realize an end-to-end application-to-architecture mapping for the accelerating diverse ML algorithms is also presented. Finally, DIMA's fundamental tradeoff between energy and accuracy in the low-compute SNR regime is analyzed to determine energy-optimum design parameters.
ISSN:	0018-9219 1558-2256
DOI:	10.1109/JPROC.2020.3034117