Loop Optimization for Divergence Reduction on GPUs with SIMT Architecture

The single-instruction multiple thread (SIMT) architecture that can be found in some latest graphical processing units (GPUs) builds on the conventional single-instruction multiple data (SIMD) parallelism while adopting the thread programming model. The architecture suffers from a degraded performan...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 26; no. 6; pp. 1633 - 1642
Main Author	Novak, Roman
Format	Journal Article
Language	English
Published	IEEE 01.06.2015
Subjects	Computer architecture Dynamic scheduling Graphics processing units Instruction sets Optimal scheduling Schedules multithreaded processors efficiency analysis SIMT optimization Concurrent programming iteration scheduling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The single-instruction multiple thread (SIMT) architecture that can be found in some latest graphical processing units (GPUs) builds on the conventional single-instruction multiple data (SIMD) parallelism while adopting the thread programming model. The architecture suffers from a degraded performance caused by the inefficient divergence handling, a problem hidden by the programmer's view of independent threads. A loop optimization technique having the potential to increase efficiency of the core SIMD block while processing embedded divergences is investigated here. Concurrent loops are generally not bound to iterate in lock-step, allowing better alignment of thread flows via iteration scheduling. The concept efficiency is analyzed for fixed and flow-adapting scheduling policies. The proposed payoff model captures loop overhead implications, allowing one to assess the tradeoffs of applying the technique to a specific loop instance. Processing speedups can generally be observed in the total running time if kernels are compute-bound, as demonstrated by several examples. The studied iteration scheduling policies do not impose alterations to the core SIMD concept and design, thus preserving the benefits of data level parallelism.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2014.2324587