Accelerating Inference of Convolutional Neural Networks Using In-memory Computing

In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as a promising approach for energy-efficient, high throughput hardware for deep learning applications. One prominent application of IMC is that of performing matrix-vector multiplication in time complexity b...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in computational neuroscience Vol. 15; p. 674154
Main Authors	Dazzi, Martino, Sebastian, Abu, Benini, Luca, Eleftheriou, Evangelos
Format	Journal Article
Language	English
Published	Switzerland Frontiers Research Foundation 03.08.2021 Frontiers Media S.A
Subjects	AI hardware Architecture computational memory Computer applications convolutional neural network Deep learning Design in-memory computing Latency Mapping Memory neural network acceleration Neural networks Neuroscience Synaptic strength computational memory AI hardware neural network acceleration in-memory computing convolutional neural network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as a promising approach for energy-efficient, high throughput hardware for deep learning applications. One prominent application of IMC is that of performing matrix-vector multiplication in time complexity by mapping the synaptic weights of a neural-network layer to the devices of an IMC core. However, because of the significantly different pattern of execution compared to previous computational paradigms, IMC requires a rethinking of the architectural design choices made when designing deep-learning hardware. In this work, we focus on application-specific, IMC hardware for inference of Convolution Neural Networks (CNNs), and provide methodologies for implementing the various architectural components of the IMC core. Specifically, we present methods for mapping synaptic weights and activations on the memory structures and give evidence of the various trade-offs therein, such as the one between on-chip memory requirements and execution latency. Lastly, we show how to employ these methods to implement a pipelined dataflow that offers throughput and latency beyond state-of-the-art for image classification tasks.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Oliver Rhodes, The University of Manchester, United Kingdom Reviewed by: Shimeng Yu, Georgia Institute of Technology, United States; Rishad Shafik, Newcastle University, United Kingdom
ISSN:	1662-5188 1662-5188
DOI:	10.3389/fncom.2021.674154