Hardware-Software Co-Design Implementation of Fixed-Point GoogleNet on SoC Using Xilinx Vitis

The use of convolutional neural networks (CNNs) has gained significant popularity in recent years due to their effectiveness in many applications, such as image recognition and classification. Field programmable gate arrays (FPGAs) have gained increased appeal compared to GPUs and CPUs due to their...

Full description

Saved in:
Bibliographic Details
Published in2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES) pp. 274 - 278
Main Authors Elhewehy, Mohamed A., Abbass, Karim O., Nasr, Omar A.
Format Conference Proceeding
LanguageEnglish
Published IEEE 21.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The use of convolutional neural networks (CNNs) has gained significant popularity in recent years due to their effectiveness in many applications, such as image recognition and classification. Field programmable gate arrays (FPGAs) have gained increased appeal compared to GPUs and CPUs due to their energy efficiency, high throughput, and scalability benefits. However, CNNs can be computationally intensive and require large amounts of memory, making their implementation on resource-limited devices such as FPGAs challenging. Moreover, the presence of High-Level Synthesis (HLS) contributes to fast design time, reducing the programming workload and enhancing FPGA design efficiency. Furthermore, the HLS co-design flow enables faster design exploration and optimization, facilitating rapid prototyping and convenient modifications. In this paper, a Hardware/Software (HW/SW) Co-design approach is introduced for the implementation of GoogleNet, a popular CNN architecture, on the Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit using the Xilinx Vitis tool. This approach involves offloading the most computationally intensive components to the FPGA, while the remaining parts of the network run on an embedded Central Processing Unit (CPU). The proposed model is then modified to use fixed-point arithmetic using post-training quantization techniques and different HLS optimizations, resulting in improvements in hardware resources while achieving low power. Experimental results show that the model maintains high accuracy while achieving significant reductions in required hardware resources for FPGA implementation. The results exhibit a total on-chip power consumption of 2.49 watts, considering 20-bit fixed-point data precision with fewer hardware resources compared to the corresponding RTL accelerator.
DOI:10.1109/NILES59815.2023.10296801