Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations

Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE International Conference on Cluster Computing (CLUSTER) pp. 132 - 139
Main Authors	Endo, Toshio, Guanghao Jin
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2014
Subjects	Arrays Bandwidth Graphics processing units Libraries Performance evaluation Programming Supercomputers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support larger domain sizes, we utilize the memory hierarchy of GPGPU clusters; larger host memory is used for maintain large domains. However, it is challenging to achieve all of larger domain sizes, high performance and easiness of program development. Towards this goal, we combine two software technologies. From the aspect of algorithm, we adopt a locality improvement technique called temporal blocking. From the aspect of system software, we developed a MPI/CUDA wrapper library named HHRT, which supports memory swapping and finer grained programming model. With this combination, we demonstrate that our goal is achieved through evaluations on TSUBAME2.5, a petascale GPGPU supercomputer.
ISSN:	1552-5244 2168-9253
DOI:	10.1109/CLUSTER.2014.6968747