Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

In this paper, we describe a hybrid MPI implementation of a discontinuous Galerkin scheme in Computational Fluid Dynamics which can utilize all the available processing units (CPU cores or GPU devices) on each computational node. We describe the optimization techniques used in our GPU implementation...

Full description

Saved in:

Bibliographic Details
Published in	Internet and Distributed Computing Systems Vol. 11874; pp. 100 - 111
Main Authors	Choi, Jake, Kim, Yoonhee, Yeom, Heon-young
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2019 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	CFD Compression CUDA GPU Memory MPI
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we describe a hybrid MPI implementation of a discontinuous Galerkin scheme in Computational Fluid Dynamics which can utilize all the available processing units (CPU cores or GPU devices) on each computational node. We describe the optimization techniques used in our GPU implementation making it up to 74.88x faster than the single core CPU implementation in our machine environment. We also perform experiments on work partitioning between heterogeneous devices to measure the ideal load balance achieving the optimal performance in a single node consisting of heterogeneous processing units. The key problem is that CFD workloads need to allocate large amounts of both host and GPU device memory in order to compute accurate results. There exists an economic burden, not to mention additional communication overheads of simply scaling out by adding more nodes with high-end scientific GPU devices. In a micro-management perspective, workload size in each single node is also limited by its attached GPU memory capacity. To overcome this, we use ZFP, a floating-point compression algorithm to save at least 25% of data usage in our workloads, with less performance degradation than using NVIDIA UM.
Bibliography:	This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (2015M3C4A7065646).
ISBN:	9783030349134 3030349136
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-34914-1_10