GLES: A Practical GPGPU Optimizing Compiler Using Data Sharing and Thread Coarsening

Writing optimized CUDA programs for General Purpose Graphics Processing Unit (GPGPU) is complicated and error-prone. Most of the former compiler optimization methods are impractical for many applications that contain divergent control flows, and they failed to fully exploit optimization opportunitie...

Full description

Saved in:

Bibliographic Details
Published in	Languages and Compilers for Parallel Computing pp. 36 - 50
Main Authors	Lin, Zhen, Gao, Xiaopeng, Wan, Han, Jiang, Bo
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing 2015
Series	Lecture Notes in Computer Science
Subjects	Compiler GPGPU Optimization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Writing optimized CUDA programs for General Purpose Graphics Processing Unit (GPGPU) is complicated and error-prone. Most of the former compiler optimization methods are impractical for many applications that contain divergent control flows, and they failed to fully exploit optimization opportunities in data sharing and thread coarsening. In this paper, we present GLES, an optimizing compiler for GPGPU programs. GLES proposes two optimization techniques based on divergence analysis. The first one is data sharing optimization for data reuse and bandwidth enhancement. The other one is thread granularity coarsening for reducing redundant instructions. Our experiments on 6 real-world programs show that GPGPU programs optimized by GLES achieve similar performance compared with manually tuned GPGPU programs. Furthermore, GLES is not only applicable to a much wider range of GPGPU programs than the state-of-art GPGPU optimizing compiler, but it also achieves higher or close performance on 8 out of 9 benchmarks.
ISBN:	331917472X 9783319174723
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-17473-0_3