GLES: A Practical GPGPU Optimizing Compiler Using Data Sharing and Thread Coarsening
Writing optimized CUDA programs for General Purpose Graphics Processing Unit (GPGPU) is complicated and error-prone. Most of the former compiler optimization methods are impractical for many applications that contain divergent control flows, and they failed to fully exploit optimization opportunitie...
Saved in:
Published in | Languages and Compilers for Parallel Computing pp. 36 - 50 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
2015
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Writing optimized CUDA programs for General Purpose Graphics Processing Unit (GPGPU) is complicated and error-prone. Most of the former compiler optimization methods are impractical for many applications that contain divergent control flows, and they failed to fully exploit optimization opportunities in data sharing and thread coarsening. In this paper, we present GLES, an optimizing compiler for GPGPU programs. GLES proposes two optimization techniques based on divergence analysis. The first one is data sharing optimization for data reuse and bandwidth enhancement. The other one is thread granularity coarsening for reducing redundant instructions. Our experiments on 6 real-world programs show that GPGPU programs optimized by GLES achieve similar performance compared with manually tuned GPGPU programs. Furthermore, GLES is not only applicable to a much wider range of GPGPU programs than the state-of-art GPGPU optimizing compiler, but it also achieves higher or close performance on 8 out of 9 benchmarks. |
---|---|
ISBN: | 331917472X 9783319174723 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-17473-0_3 |