Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations
Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support...
Saved in:
Published in | 2014 IEEE International Conference on Cluster Computing (CLUSTER) pp. 132 - 139 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.09.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Stencil computations, which are important kernels for CFD simulations, have been highly successful on GPGPU clusters, due to high memory bandwidth and computation speed of GPU accelerators. However, sizes of the computed domains are limited by small capacity of GPU device memory. In order to support larger domain sizes, we utilize the memory hierarchy of GPGPU clusters; larger host memory is used for maintain large domains. However, it is challenging to achieve all of larger domain sizes, high performance and easiness of program development. Towards this goal, we combine two software technologies. From the aspect of algorithm, we adopt a locality improvement technique called temporal blocking. From the aspect of system software, we developed a MPI/CUDA wrapper library named HHRT, which supports memory swapping and finer grained programming model. With this combination, we demonstrate that our goal is achieved through evaluations on TSUBAME2.5, a petascale GPGPU supercomputer. |
---|---|
ISSN: | 1552-5244 2168-9253 |
DOI: | 10.1109/CLUSTER.2014.6968747 |