Enabling High-Efficient ReRAM-based CNN Training via Exploiting Crossbar-Level Insignificant Writing Elimination

Convolutional neural networks (CNNs) have been widely adopted in many deep learning applications. However, training a deep CNN requests intensive data transfer, which is both time and energy consuming. Using resistive random-access memory (ReRAM) to process data locally in memory is an emerging solu...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 72; no. 11; pp. 1 - 12
Main Authors	Wang, Lening, Wan, Qiyu, Ma, Peixun, Wang, Jing, Chen, Minsong, Song, Shuaiwen Leon, Fu, Xin
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerators Artificial neural networks CNN Convolution Convolutional neural networks Data transfer (computers) Kernel Machine learning Parallel processing Quantization (signal) Random access memory ReRAM Training Writing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Convolutional neural networks (CNNs) have been widely adopted in many deep learning applications. However, training a deep CNN requests intensive data transfer, which is both time and energy consuming. Using resistive random-access memory (ReRAM) to process data locally in memory is an emerging solution to eliminate the massive data movement. However, training cannot be efficiently supported with current ReRAM-based PIM accelerators because of the frequent and high-cost ReRAM writing operations from the delay, energy, and ReRAM lifetime perspectives. In this paper, we observe that activation induced and weight updating induced writing operations dominate the training energy on ReRAM-based accelerators. We then exploit and leverage a new angle in intermediate data (e.g., activations and errors) sparsity that fits the unique computation pattern in ReRAM crossbars to effectively eliminate the insignificant ReRAM writings, thus, enabling highly efficient CNN training without hurting the training accuracy. The experiment results show our proposed scheme achieves averagely <inline-formula><tex-math notation="LaTeX">4.97\times</tex-math></inline-formula> (<inline-formula><tex-math notation="LaTeX">19.23\times</tex-math></inline-formula>) energy saving and <inline-formula><tex-math notation="LaTeX">1.38\times</tex-math></inline-formula> (<inline-formula><tex-math notation="LaTeX">30.08\times</tex-math></inline-formula>) speedup compared to the state-of-the-art ReRAM-based accelerator (GPU). Our scheme also achieves <inline-formula><tex-math notation="LaTeX">4.55\times</tex-math></inline-formula> lifetime enhancement compared to the state-of-the-art ReRAM accelerator.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2023.3288763