Optimization Approach to Accelerator Codesign

We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable general purpose computing on graphics processing units. We first introduce a simple, analytical m...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 6; pp. 1300 - 1313
Main Authors	Prajapati, Nirmal, Rajopadhye, Sanjay, Djidjev, Hristo, Santhi, Nandakishore, Grosser, Tobias, Andonov, Rumen
Format	Journal Article
Language	English
Published	New York IEEE 01.06.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Analytical models Automatic parallelization Co-design Computational modeling design space exploration Embedded systems Graphics processing units graphics processing units (GPUs) Hardware Landscape design Mathematical model Mathematical models Multiprocessing Optimization Parameters polyhedral model Software Three dimensional models tiling Workload Workloads
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable general purpose computing on graphics processing units. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution-time model and formulate a mathematical optimization problem that seeks to maximize a common objective function of all the hardware and software parameters . The solution to this problem, therefore, "solves" the codesign problem: simultaneously choosing software-hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2-D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3-D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28% (respectively, 33%) with simple tweaks to the hardware parameters, such as tuning the number of streaming multiprocessors, the number of compute cores each contains, and the size of shared memory. We also develop a number of insights about the optimal regions of the design landscape.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2019.2926489