NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA

Recently, algorithm and hardware co-design for neu-ral networks (NNs) has become the key to obtaining high-quality solutions. However, prior works lack consideration of the underlying hardware and thus suffer from a severely unbalanced neural architecture and hardware architecture search (NA-HAS) sp...

Full description

Saved in:

Bibliographic Details
Published in	2023 Design, Automation & Test in Europe Conference & Exhibition (DATE) pp. 1 - 6
Main Authors	Lou, Wenqi, Qian, Jiaming, Gong, Lei, Wang, Xuan, Wang, Chao, Zhou, Xuehai
Format	Conference Proceeding
Language	English
Published	EDAA 01.04.2023
Subjects	Energy efficiency Hardware Heuristic algorithms Multicore processing Quantization (signal) Software Throughput
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently, algorithm and hardware co-design for neu-ral networks (NNs) has become the key to obtaining high-quality solutions. However, prior works lack consideration of the underlying hardware and thus suffer from a severely unbalanced neural architecture and hardware architecture search (NA-HAS) space on FPGAs, failing to unleash the performance potential. Nevertheless, a deeper joint search leads to a larger (multiplicative) search space, highly challenging the search. To this end, we propose an efficient differentiable search framework NAF, which jointly searches the networks (e.g., operations and bitwidths) and accelerators (e.g., heterogeneous multicores and mappings) under a balanced NA-HAS space. Concretely, we design a coarse-grained hardware-friendly quantization algorithm and integrate it at a block granularity into the co-search process. Meanwhile, we design a highly optimized block processing unit (BPU) with key dataflow configurable. Afterward, a dynamic hardware generation algorithm based on modeling and heuristic rules is designed to perform the critical HAS and fast generate hardware feedback. Experimental results show that compared with the previous state-of-the-art (SOTA) co-design works, NAF improves the throughput by 1.99\times\sim 6.84\times on Xilinx ZCU102 and energy efficiency by 17%~88% under similar accuracy on the ImageNet dataset.
ISSN:	1558-1101
DOI:	10.23919/DATE56975.2023.10137094