A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm
A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in e...
Saved in:
Published in | 2019 Symposium on VLSI Technology pp. C150 - C151 |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
The Japan Society of Applied Physics
01.06.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0\ \text{mm}\times 2.6\ \text{mm} chip exhibits 12.6\times(8.4\times) energy efficiency gain, 11.7\times(77.6\times) off-chip bandwidth efficiency gain and 17.1\times(36.9\times) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices. |
---|---|
ISSN: | 2158-9682 |
DOI: | 10.23919/VLSIT.2019.8776507 |