Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators
A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Ne...
Saved in:
Published in | 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 240 - 242 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Network (DNN). This is especially critical due to trends in complex network topologies and the emergence of eager execution. This work demonstrates that there exists up to a 5.2x performance gap in DL inference to be bridged using SPM management, on a set of image, object and language networks. We propose OnSRAM, a novel SPM management framework integrated with a DL accelerator runtime. OnSRAM has two variants, viz. OnSRAM-Static, which works on static graphs to identify data structures that should be held on-chip based on their properties, and OnSRAM-Eager, which targets an eager execution model (no graph) and uses a speculative scheme to hold/discard data structures. On a prototypical DL accelerator, OnSRAM-Static and OnSRAM-Eager achieve reductions in inference latency (batch size of 1) of 1.02-4.8 x and 1.02-3.1 x, respectively, over a baseline with no SPM management. |
---|---|
DOI: | 10.1109/ISPASS51385.2021.00046 |