A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of solid-state circuits Vol. 54; no. 6; pp. 1789 - 1799
Main Authors	Valavi, Hossein, Ramadge, Peter J., Nestler, Eric, Verma, Naveen
Format	Journal Article
Language	English
Published	New York IEEE 01.06.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Capacitors Charge-domain compute CMOS Computation Computer architecture Computer memory Convolution deep learning Digital computers hardware accelerators in-memory computing Integrated circuits Mathematical analysis Matrix algebra Matrix methods Metal oxides Method of moments Microprocessors Neural networks Neurons Random access memory Signal to noise ratio Weight
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as <inline-formula> <tex-math notation="LaTeX">8\times 8=64 </tex-math></inline-formula> in-memory-computing neuron tiles, supporting up to 512, <inline-formula> <tex-math notation="LaTeX">3\times 3\times 512 </tex-math></inline-formula>-input HL neurons and 64, <inline-formula> <tex-math notation="LaTeX">3\times 3\times 3 </tex-math></inline-formula>-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having <inline-formula> <tex-math notation="LaTeX">1.8\times </tex-math></inline-formula> the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm 2 ), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm 2 ), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2019.2899730