Hardware-Software Co-Design Implementation of Fixed-Point GoogleNet on SoC Using Xilinx Vitis
The use of convolutional neural networks (CNNs) has gained significant popularity in recent years due to their effectiveness in many applications, such as image recognition and classification. Field programmable gate arrays (FPGAs) have gained increased appeal compared to GPUs and CPUs due to their...
Saved in:
Published in | 2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES) pp. 274 - 278 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
21.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The use of convolutional neural networks (CNNs) has gained significant popularity in recent years due to their effectiveness in many applications, such as image recognition and classification. Field programmable gate arrays (FPGAs) have gained increased appeal compared to GPUs and CPUs due to their energy efficiency, high throughput, and scalability benefits. However, CNNs can be computationally intensive and require large amounts of memory, making their implementation on resource-limited devices such as FPGAs challenging. Moreover, the presence of High-Level Synthesis (HLS) contributes to fast design time, reducing the programming workload and enhancing FPGA design efficiency. Furthermore, the HLS co-design flow enables faster design exploration and optimization, facilitating rapid prototyping and convenient modifications. In this paper, a Hardware/Software (HW/SW) Co-design approach is introduced for the implementation of GoogleNet, a popular CNN architecture, on the Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit using the Xilinx Vitis tool. This approach involves offloading the most computationally intensive components to the FPGA, while the remaining parts of the network run on an embedded Central Processing Unit (CPU). The proposed model is then modified to use fixed-point arithmetic using post-training quantization techniques and different HLS optimizations, resulting in improvements in hardware resources while achieving low power. Experimental results show that the model maintains high accuracy while achieving significant reductions in required hardware resources for FPGA implementation. The results exhibit a total on-chip power consumption of 2.49 watts, considering 20-bit fixed-point data precision with fewer hardware resources compared to the corresponding RTL accelerator. |
---|---|
DOI: | 10.1109/NILES59815.2023.10296801 |