Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations

Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Conference on Cluster Computing (CLUSTER) pp. 132 - 139
Main Authors Endo, Toshio, Guanghao Jin
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support larger domain sizes, we utilize the memory hierarchy of GPGPU clusters; larger host memory is used for maintain large domains. However, it is challenging to achieve all of larger domain sizes, high performance and easiness of program development. Towards this goal, we combine two software technologies. From the aspect of algorithm, we adopt a locality improvement technique called temporal blocking. From the aspect of system software, we developed a MPI/CUDA wrapper library named HHRT, which supports memory swapping and finer grained programming model. With this combination, we demonstrate that our goal is achieved through evaluations on TSUBAME2.5, a petascale GPGPU supercomputer.
ISSN:1552-5244
2168-9253
DOI:10.1109/CLUSTER.2014.6968747