A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in e...

Full description

Saved in:
Bibliographic Details
Published in2019 Symposium on VLSI Circuits pp. C150 - C151
Main Authors Pal, Subhankar, Park, Dong-hyeon, Feng, Siying, Gao, Paul, Tan, Jielun, Rovinski, Austin, Xie, Shaolin, Zhao, Chun, Amarnath, Aporva, Wesley, Timothy, Beaumont, Jonathan, Chen, Kuan-Yu, Chakrabarti, Chaitali, Taylor, Michael, Mudge, Trevor, Blaauw, David, Kim, Hun-Seok, Dreslinski, Ronald
Format Conference Proceeding
LanguageEnglish
Published JSAP 01.06.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm \times 2.6 mm chip exhibits 12.6 \times (8.4\times) energy efficiency gain, 11.7\times (77.6\times) off-chip bandwidth efficiency gain and17.1\times (36.9\times) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
ISSN:2158-5636
DOI:10.23919/VLSIC.2019.8778147