10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using...
Saved in:
Published in | IEEE access Vol. 9; pp. 71262 - 71276 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm 2 throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by <inline-formula> <tex-math notation="LaTeX">16\times </tex-math></inline-formula> compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3079425 |