10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 9; pp. 71262 - 71276
Main Authors	Nguyen, Van Truong, Kim, Jie-Seok, Lee, Jong-Wook
Format	Journal Article
Language	English
Published	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Analog to digital converters Artificial intelligence Circuits CMOS Computer architecture Computing-in-memory deep neural network Digital to analog conversion Digital to analog converters Edge computing edge processor Energy conversion efficiency Energy efficiency Layout machine learning Multiplication Network latency Neural networks Parallel processing Power consumption Processors Program processors Random access memory Sense amplifiers Static random access memory Throughput Transistors Weight
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm 2 throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by <inline-formula> <tex-math notation="LaTeX">16\times </tex-math></inline-formula> compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3079425