A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression

Model compression is widely adopted for edge inference of neural networks (NNs) to minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model compression has demonstrated promising results to maximize compression ratio and minimize accuracy drop. However, XOR-based decompres...

Full description

Saved in:
Bibliographic Details
Published in2023 60th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors Lee, Hyunseung, Hong, Jihoon, Kim, Soosung, Lee, Seung Yul, Lee, Jae W.
Format Conference Proceeding
LanguageEnglish
Published IEEE 09.07.2023
Subjects
Online AccessGet full text
DOI10.1109/DAC56929.2023.10248005

Cover

Loading…
More Information
Summary:Model compression is widely adopted for edge inference of neural networks (NNs) to minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model compression has demonstrated promising results to maximize compression ratio and minimize accuracy drop. However, XOR-based decompression alone produces bit errors and requires auxiliary data for error correction. To minimize model size and hence DRAM traffic, we propose an enhanced decompression algorithm and a low-cost hardware accelerator for it. Since not all errors are equal, our algorithm selects only important errors to correct with no accuracy drop. Compared with the baseline XOR compression scheme correcting all errors, the compressed model size of ResNet-18 and VGG-16 is reduced by 23% and 27% respectively. We also present a low-cost hardware implementation of on-line XOR decompression and error-correction logic built on Gemmini, an open-source systolic array accelerator, at the cost of only a 0.39% and 0.46% increase in area and power.
DOI:10.1109/DAC56929.2023.10248005