Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications

Convolutional neural networks (CNN) provide state-of-the-art results in a wide variety of machine learning (ML) applications, ranging from image classification to speech recognition. However, they are very computationally intensive and require huge amounts of storage. Recent work strived towards red...

Full description

Saved in:

Bibliographic Details
Published in	Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 488 - 490
Main Authors	Biswas, Avishek, Chandrakasan, Anantha P.
Format	Conference Proceeding
Language	English
Published	IEEE 01.02.2018
Subjects	Computer architecture Convolution Convolutional neural networks Energy efficiency Linearity Random access memory Timing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Convolutional neural networks (CNN) provide state-of-the-art results in a wide variety of machine learning (ML) applications, ranging from image classification to speech recognition. However, they are very computationally intensive and require huge amounts of storage. Recent work strived towards reducing the size of the CNNs: [1] proposes a binary-weight-network (BWN), where the filter weights (w i 's) are ±1 (with a common scaling factor per filter: α). This leads to a significant reduction in the amount of storage required for the W i 's, making it possible to store them entirely on-chip. However, in a conventional all-digital implementation [2, 3], reading the wj i s and the partial sums from the embedded SRAMs require a lot of data movement per computation, which is energy-hungry. To reduce data-movement, and associated energy, we present an SRAM-embedded convolution architecture (Fig. 31.1.1), which does not require reading the w i 's explicitly from the memory. Prior work on embedded ML classifiers have focused on 1b outputs [4] or a small number of output classes [5], both of which are not sufficient for CNNs. This work uses 7b inputs/outputs, which is sufficient to maintain good accuracy for most of the popular CNNs [1]. The convolution operation is implemented as voltage averaging (Fig. 31.1.1), since the wj's are binary, while the averaging factor (1/N) implements the weight-coefficient α (with a new scaling factor, M, implemented off-chip).
ISSN:	2376-8606
DOI:	10.1109/ISSCC.2018.8310397