A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in e...

Full description

Saved in:
Bibliographic Details
Published in2019 Symposium on VLSI Technology pp. C150 - C151
Main Authors Pal, Subhankar, Park, Dong-hyeon, Feng, Siying, Gao, Paul, Tan, Jielun, Rovinski, Austin, Xie, Shaolin, Zhao, Chun, Amarnath, Aporva, Wesley, Timothy, Beaumont, Jonathan, Chen, Kuan-Yu, Chakrabarti, Chaitali, Taylor, Michael, Mudge, Trevor, Blaauw, David, Kim, Hun-Seok, Dreslinski, Ronald
Format Conference Proceeding
LanguageEnglish
Published The Japan Society of Applied Physics 01.06.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0\ \text{mm}\times 2.6\ \text{mm} chip exhibits 12.6\times(8.4\times) energy efficiency gain, 11.7\times(77.6\times) off-chip bandwidth efficiency gain and 17.1\times(36.9\times) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
ISSN:2158-9682
DOI:10.23919/VLSIT.2019.8776507