A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computer science and technology Vol. 27; no. 1; pp. 57 - 74
Main Author	杨杨崔慧敏冯晓兵薛京灵
Format	Journal Article
Language	English
Published	Boston Springer US 2012 Springer Nature B.V
Subjects	Algorithms Analysis Artificial Intelligence Balancing Central processing units Communication Computation Computer memory Computer Science Computers CPUs Data Structures and Information Theory Decomposition Exploitation Graphics boards Information Systems Applications (incl.Internet) Iterative methods Placement Platforms Queues Queuing R&D Registers Regular Paper Research & development Reuse Software Engineering Stencils Studies Theory of Computation 共享内存内存共享图形模板循环队列数据寄存器混合方法计算迭代方法 circular queue occupancy GPU stencil computation register
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
Bibliography:	stencil computation, circular queue, GPU, occupancy, register In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only. 11-2296/TP Yang Yang, Hui-Min Cui, Xiao-Bing Feng, Jing-Ling Xue（ 1.State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China ;2.Graduate University of Chinese Academy of Sciences, Beijing 100190, China; 3.programming Languages and Compilers Group, School of Computer Science and Engineering University of New South Wales, Sydney, NSW 2052, Australia） ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-012-1206-3