A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in e...

Full description

Saved in:

Bibliographic Details
Published in	2019 Symposium on VLSI Technology pp. C150 - C151
Main Authors	Pal, Subhankar, Park, Dong-hyeon, Feng, Siying, Gao, Paul, Tan, Jielun, Rovinski, Austin, Xie, Shaolin, Zhao, Chun, Amarnath, Aporva, Wesley, Timothy, Beaumont, Jonathan, Chen, Kuan-Yu, Chakrabarti, Chaitali, Taylor, Michael, Mudge, Trevor, Blaauw, David, Kim, Hun-Seok, Dreslinski, Ronald
Format	Conference Proceeding
Language	English
Published	The Japan Society of Applied Physics 01.06.2019
Subjects	Bandwidth Central Processing Unit decoupled access-execution Graphics processing units Indexes reconfigurablility and accelerator Semiconductor device measurement Sparse matrices Sparse matrix multiplier synthesizable crossbar System-on-chip
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0\ \text{mm}\times 2.6\ \text{mm} chip exhibits 12.6\times(8.4\times) energy efficiency gain, 11.7\times(77.6\times) off-chip bandwidth efficiency gain and 17.1\times(36.9\times) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
ISSN:	2158-9682
DOI:	10.23919/VLSIT.2019.8776507