A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm
A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in e...
Saved in:
Published in | 2019 Symposium on VLSI Circuits pp. C150 - C151 |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
JSAP
01.06.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm \times 2.6 mm chip exhibits 12.6 \times (8.4\times) energy efficiency gain, 11.7\times (77.6\times) off-chip bandwidth efficiency gain and17.1\times (36.9\times) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices. |
---|---|
ISSN: | 2158-5636 |
DOI: | 10.23919/VLSIC.2019.8778147 |