A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression

Model compression is widely adopted for edge inference of neural networks (NNs) to minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model compression has demonstrated promising results to maximize compression ratio and minimize accuracy drop. However, XOR-based decompres...

Full description

Saved in:

Bibliographic Details
Published in	2023 60th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors	Lee, Hyunseung, Hong, Jihoon, Kim, Soosung, Lee, Seung Yul, Lee, Jae W.
Format	Conference Proceeding
Language	English
Published	IEEE 09.07.2023
Subjects	Costs Design automation Graphics processing units Inference algorithms Power demand Random access memory Systolic arrays
Online Access	Get full text
DOI	10.1109/DAC56929.2023.10248005

Cover

Loading…

More Information
Summary:	Model compression is widely adopted for edge inference of neural networks (NNs) to minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model compression has demonstrated promising results to maximize compression ratio and minimize accuracy drop. However, XOR-based decompression alone produces bit errors and requires auxiliary data for error correction. To minimize model size and hence DRAM traffic, we propose an enhanced decompression algorithm and a low-cost hardware accelerator for it. Since not all errors are equal, our algorithm selects only important errors to correct with no accuracy drop. Compared with the baseline XOR compression scheme correcting all errors, the compressed model size of ResNet-18 and VGG-16 is reduced by 23% and 27% respectively. We also present a low-cost hardware implementation of on-line XOR decompression and error-correction logic built on Gemmini, an open-source systolic array accelerator, at the cost of only a 0.39% and 0.46% increase in area and power.
DOI:	10.1109/DAC56929.2023.10248005