An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence

The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM's peripheral sensing c...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal on exploratory solid-state computational devices and circuits Vol. 8; no. 2; pp. 111 - 118
Main Authors Xiao, Rui, Jiang, Wenyu, Chee, Piew Yoong
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM's peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves <inline-formula> <tex-math notation="LaTeX">18.4\times </tex-math></inline-formula> in energy with 0.136 pJ/MAC efficiency, and <inline-formula> <tex-math notation="LaTeX">19.9\times </tex-math></inline-formula> area for 1T1R case and <inline-formula> <tex-math notation="LaTeX">15.9\times </tex-math></inline-formula> for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over <inline-formula> <tex-math notation="LaTeX">16\times </tex-math></inline-formula> area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.
ISSN:2329-9231
2329-9231
DOI:10.1109/JXCDC.2022.3206879