BLADE: An in-Cache Computing Architecture for Edge Devices

Area and power-constrained edge devices are increasingly utilized to perform compute intensive workloads, necessitating increasingly area and power-efficient accelerators. In this context, in-SRAM computing performs hundreds of parallel operations on spatially local data common in many emerging work...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 69; no. 9; pp. 1349 - 1363
Main Authors	Simon, William Andrew, Qureshi, Yasir Mahmood, Rios, Marco, Levisse, Alexandre, Zapater, Marina, Atienza, David
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerators Arrays Artificial neural networks Benchmark testing bitline computing Blades Central Processing Unit Computation Computer architecture Computer simulation Cryptography Devices edge computing In-memory computing in-SRAM computing Microprocessors Neon Parallel operation Performance evaluation Power consumption Power management Random access memory Transistors Workload Workloads
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Area and power-constrained edge devices are increasingly utilized to perform compute intensive workloads, necessitating increasingly area and power-efficient accelerators. In this context, in-SRAM computing performs hundreds of parallel operations on spatially local data common in many emerging workloads, while reducing power consumption due to data movement. However, in-SRAM computing faces many challenges, including integration into the existing architecture, arithmetic operation support, data corruption at high operating frequencies, inability to run at low voltages, and low area density. To meet these challenges, this article introduces BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an in-SRAM computing architecture that utilizes local wordline groups to perform computations at a frequency 2.8x higher than state-of-the-art in-SRAM computing architectures. BLADE is integrated into the cache hierarchy of low-voltage edge devices, and simulated and benchmarked at the transistor, architecture, and software abstraction levels. Experimental results demonstrate performance/energy gains over an equivalent NEON accelerated processor for a variety of edge device workloads, namely, cryptography (4x performance gain/6x energy reduction), video encoding (6x/2x), and convolutional neural networks (3x/1.5x), while maintaining the highest frequency/energy ratio (up to 2.2 Ghz@1V) of any conventional in-SRAM computing architecture, and a low area overhead of less than 8 percent.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2020.2972528