Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Ne...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 240 - 242
Main Authors Pal, Subhankar, Venkataramani, Swagath, Srinivasan, Viji, Gopalakrishnan, Kailash
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Network (DNN). This is especially critical due to trends in complex network topologies and the emergence of eager execution. This work demonstrates that there exists up to a 5.2x performance gap in DL inference to be bridged using SPM management, on a set of image, object and language networks. We propose OnSRAM, a novel SPM management framework integrated with a DL accelerator runtime. OnSRAM has two variants, viz. OnSRAM-Static, which works on static graphs to identify data structures that should be held on-chip based on their properties, and OnSRAM-Eager, which targets an eager execution model (no graph) and uses a speculative scheme to hold/discard data structures. On a prototypical DL accelerator, OnSRAM-Static and OnSRAM-Eager achieve reductions in inference latency (batch size of 1) of 1.02-4.8 x and 1.02-3.1 x, respectively, over a baseline with no SPM management.
DOI:10.1109/ISPASS51385.2021.00046