Optimization Approach to Accelerator Codesign

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable general purpose computing on graphics processing units. We first introduce a simple, analytical m...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 6; pp. 1300 - 1313
Main Authors Prajapati, Nirmal, Rajopadhye, Sanjay, Djidjev, Hristo, Santhi, Nandakishore, Grosser, Tobias, Andonov, Rumen
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable general purpose computing on graphics processing units. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution-time model and formulate a mathematical optimization problem that seeks to maximize a common objective function of all the hardware and software parameters . The solution to this problem, therefore, "solves" the codesign problem: simultaneously choosing software-hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2-D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3-D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28% (respectively, 33%) with simple tweaks to the hardware parameters, such as tuning the number of streaming multiprocessors, the number of compute cores each contains, and the size of shared memory. We also develop a number of insights about the optimal regions of the design landscape.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2019.2926489