Design of Power-Efficient Training Accelerator for Convolution Neural Networks

To realize deep learning techniques, a type of deep neural network (DNN) called a convolutional neural networks (CNN) is among the most widely used models aimed at image recognition applications. However, there is growing demand for light-weight and low-power neural network accelerators, not only fo...

Full description

Saved in:

Bibliographic Details
Published in	Electronics (Basel) Vol. 10; no. 7; p. 787
Main Authors	Hong, JiUn, Arslan, Saad, Lee, TaeGeon, Kim, HyungWon
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.04.2021
Subjects	Accelerators Accuracy Artificial neural networks Back propagation Classification Convolution Data paths Deep learning Design Edge computing Energy consumption Energy efficiency Field programmable gate arrays Floating point arithmetic Graphics processing units High speed Inference Internet of Things Machine learning Mobile computing Neural networks Object recognition Power management Propagation Software Training Weight reduction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To realize deep learning techniques, a type of deep neural network (DNN) called a convolutional neural networks (CNN) is among the most widely used models aimed at image recognition applications. However, there is growing demand for light-weight and low-power neural network accelerators, not only for inference but also for training process. In this paper, we propose a training accelerator that provides low power and compact chip size targeted for mobile and edge computing applications. It accelerates to achieve the real-time processing of both inference and training using concurrent floating-point data paths. The proposed accelerator can be externally controlled and employs resource sharing and an integrated convolution-pooling block to achieve low area and low energy consumption. We implemented the proposed training accelerator in an FPGA (Field Programmable Gate Array) and evaluated its training performance using an MNIST CNN example in comparison with a PC with GPU (Graphics Processing Unit). While both methods achieved a similar training accuracy of 95.1%, the proposed accelerator, when implemented in a silicon chip, reduced the energy consumption by 480 times compared to the counterpart. Additionally, when implemented on an FPGA, an energy reduction of over 4.5 times was achieved compared to the existing FPGA training accelerator for the MNIST dataset. Therefore, the proposed accelerator is more suitable for deployment in mobile/edge nodes compared to the existing software and hardware accelerators.
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics10070787