Pragma Directed Shared Memory Centric Optimizations on GPUs

GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computer science and technology Vol. 31; no. 2; pp. 235 - 252
Main Author	Jing Li CCF, Lei Liu Yuan Wu Xiang-Hua Liu Yi Gao Xiao-Bing Feng Cheng-YongWu
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2016 Springer Nature B.V State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China%Beijing Samsung Telecom Research and Development Center, Beijing 100028, China
Subjects	Analysis Architecture Arrays Artificial Intelligence Bandwidths Central processing units Compilers Computer programming Computer Science Computer simulation Concurrent processing Conveying CPUs Data management Data Structures and Information Theory Design Design optimization GPU Graphics processing units Information Systems Applications (incl.Internet) Mathematical models Optimization Optimization techniques Parallel processing Parameters Performance enhancement Programmers R&D Regular Paper Research & development Resource utilization Science Software Engineering Studies Theory of Computation 共享内存内存优化多面体模型带宽利用率并行处理能力程序员资源利用率 data centric pragma directed GPU shared memory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is not an easy work. It often requires programmer expertise and nontrivial parameter selection. Improper shared memory usage might even underutilize GPU resource： Even using state-of-the-art high level programming models （e.g., OpenACC and OpenHMPP）, it is still hard to utilize shared memory since they lack inherent support in describing shared memory optimization and selecting suitable parameters, let alone maintaining high resource utilization. Targeting higher productivity for affine applications, we propose a data centric way to shared memory optimization on GPU. We design a pragma extension on OpenACC so as to convey data management hints of programmers to compiler. Meanwhile, we devise a compiler framework to automatically select optimal parameters for shared arrays, using the polyhedral model. We further propose optimization techniques to expose higher memory and instruction level parallelism. The experimental results show that our shared memory centric approaches effectively improve the performance of five typical GPU applications across four widely used platforms by 3.7x on average, and do not burden programmers with lots of pragmas.
Bibliography:	GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is not an easy work. It often requires programmer expertise and nontrivial parameter selection. Improper shared memory usage might even underutilize GPU resource： Even using state-of-the-art high level programming models （e.g., OpenACC and OpenHMPP）, it is still hard to utilize shared memory since they lack inherent support in describing shared memory optimization and selecting suitable parameters, let alone maintaining high resource utilization. Targeting higher productivity for affine applications, we propose a data centric way to shared memory optimization on GPU. We design a pragma extension on OpenACC so as to convey data management hints of programmers to compiler. Meanwhile, we devise a compiler framework to automatically select optimal parameters for shared arrays, using the polyhedral model. We further propose optimization techniques to expose higher memory and instruction level parallelism. The experimental results show that our shared memory centric approaches effectively improve the performance of five typical GPU applications across four widely used platforms by 3.7x on average, and do not burden programmers with lots of pragmas. 11-2296/TP GPU, shared memory, pragma directed, data centric ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-016-1624-8