A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture

This brief presents a memory-efficient CNN accelerator design for resource-constrained devices in Internet of Things (IoT) and autonomous systems. A segmented logarithmic (SegLog) quantization method is exploited to mitigate the on-chip memory and bandwidth requirements, thus accommodating more proc...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems. II, Express briefs Vol. 68; no. 6; pp. 2142 - 2146
Main Authors Xu, Jiawei, Huan, Yuxiang, Huang, Boming, Chu, Haoming, Jin, Yi, Zheng, Li-Rong, Zou, Zhuo
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This brief presents a memory-efficient CNN accelerator design for resource-constrained devices in Internet of Things (IoT) and autonomous systems. A segmented logarithmic (SegLog) quantization method is exploited to mitigate the on-chip memory and bandwidth requirements, thus accommodating more processing elements (PEs) in a given chip area to organize a reconfigurable multi-cluster architecture. The evaluation results show that SegLog quantization can achieve <inline-formula> <tex-math notation="LaTeX">6.4\times </tex-math></inline-formula> model compression with less than 2.5% accuracy loss on various CNNs. An ASIC implementation with 168 PEs configuration is validated in a 40-nm CMOS process, with 2.54 TOPs/W energy efficiency and 0.8 mm 2 chip area reported. The accelerator has also been implemented on FPGA with 1512 PEs configured and 468 kB on-chip memory, achieving a 1.29 GOPs/kB memory efficiency. Compared with the state-of-the-art accelerators, our ASIC implementation enhances area efficiency and arithmetic intensity by <inline-formula> <tex-math notation="LaTeX">1.94\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">5.62\times </tex-math></inline-formula>, while the FPGA implementation achieves the memory efficiency improvement by a factor of <inline-formula> <tex-math notation="LaTeX">2.34\times </tex-math></inline-formula>.
ISSN:1549-7747
1558-3791
1558-3791
DOI:10.1109/TCSII.2020.3038897