Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Ne...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 240 - 242
Main Authors	Pal, Subhankar, Venkataramani, Swagath, Srinivasan, Viji, Gopalakrishnan, Kailash
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2021
Subjects	ai accelerator Data structures Deep learning deep neural networks eager execution hardware accelerators Memory management Productivity Runtime scratchpad management Software
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Network (DNN). This is especially critical due to trends in complex network topologies and the emergence of eager execution. This work demonstrates that there exists up to a 5.2x performance gap in DL inference to be bridged using SPM management, on a set of image, object and language networks. We propose OnSRAM, a novel SPM management framework integrated with a DL accelerator runtime. OnSRAM has two variants, viz. OnSRAM-Static, which works on static graphs to identify data structures that should be held on-chip based on their properties, and OnSRAM-Eager, which targets an eager execution model (no graph) and uses a speculative scheme to hold/discard data structures. On a prototypical DL accelerator, OnSRAM-Static and OnSRAM-Eager achieve reductions in inference latency (batch size of 1) of 1.02-4.8 x and 1.02-3.1 x, respectively, over a baseline with no SPM management.
DOI:	10.1109/ISPASS51385.2021.00046