Research on Efficient CNN Acceleration Through Mixed Precision Quantization: A Comprehensive Methodology

To overcome challenges associated with deploying Convolutional Neural Networks (CNNs) on edge computing devices with limited memory and computing resources, we propose a mixed-precision CNN calculation method on a Field Programmable Gate Array (FPGA). This approach involves a collaborative design en...

Full description

Saved in:

Bibliographic Details
Published in	International journal of advanced computer science & applications Vol. 14; no. 12
Main Authors	He, Yizhi, Liu, Wenlong, Tahir, Muhammad, Li, Zhao, Zhang, Shaoshuang, Amur, Hussain Bux
Format	Journal Article
Language	English
Published	West Yorkshire Science and Information (SAI) Organization Limited 2023
Subjects	Acceleration Accuracy Artificial neural networks Central processing units CPUs Design techniques Edge computing Field programmable gate arrays Floating point arithmetic Inference Memory devices Search algorithms
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To overcome challenges associated with deploying Convolutional Neural Networks (CNNs) on edge computing devices with limited memory and computing resources, we propose a mixed-precision CNN calculation method on a Field Programmable Gate Array (FPGA). This approach involves a collaborative design encompassing both software and hardware aspects. Initially, we devised a CNN quantization method tailored for the fixed-point operation characteristics of FPGA, addressing the computational challenges posed by floating-point parameters. We introduce a bit-width strategy search algorithm that assigns bit-widths to each layer based on CNN loss variation induced by quantization. Through retraining, this strategy mitigates the degradation in CNN inference accuracy. For FPGA acceleration design, we employ a flow processing architecture with multiple Processing Elements (PEs) to support mixed-precision CNNs. Our approach incorporates a folding design method to implement shared PEs between layers, significantly reducing FPGA resource usage. Furthermore, we designed a data reading method, incorporating a register set buffer between memory and processing elements to alleviate issues related to mismatched data reading and computing speeds. Our implementation of the mixed-precision ResNet20 model on the Kintex-7 Eco R2 development board achieves an inference accuracy of 91.68% and a computing speed 4.27 times faster than the Central Processing Unit (CPU) on the CIFAR-10 dataset, with an accuracy drop of only 1.21%. Compared to a unified 16-bit FPGA accelerator design method, our proposed approach demonstrates an 89-fold increase in computing speed while maintaining similar accuracy.
ISSN:	2158-107X 2156-5570
DOI:	10.14569/IJACSA.2023.0141282