Development of a Parallel Explicit Finite-Volume Euler Equation Solver using the Immersed Boundary Method with Hybrid MPI-CUDA Paradigm

ABSTRACT This study proposed the application of a novel immersed boundary method (IBM) for the treatment of irregular geometries using Cartesian computational grids for high speed compressible gas flows modelled using the unsteady Euler equations. Furthermore, the method is accelerated through the u...

Full description

Saved in:

Bibliographic Details
Published in	Journal of mechanics Vol. 36; no. 1; pp. 87 - 102
Main Authors	Kuo, F. A., Chiang, C. H., Lo, M. C., Wu, J. S.
Format	Journal Article
Language	English
Published	Taipei Oxford University Press 01.02.2020
Subjects	Benchmarks Cartesian coordinates Compressibility Computational grids Continuity equation Efficiency Euler-Lagrange equation Fluxes Gas flow Graphics processing units Supersonic flow
Online Access	Get full text

Cover

Loading…

More Information
Summary:	ABSTRACT This study proposed the application of a novel immersed boundary method (IBM) for the treatment of irregular geometries using Cartesian computational grids for high speed compressible gas flows modelled using the unsteady Euler equations. Furthermore, the method is accelerated through the use of multiple Graphics Processing Units – specifically using Nvidia’s CUDA together with MPI - due to the computationally intensive nature associated with the numerical solution to multi-dimensional continuity equations. Due to the high degree of locality required for efficient multiple GPU computation, the Split Harten-Lax-van-Leer (SHLL) scheme is employed for vector splitting of fluxes across cell interfaces. NVIDIA visual profiler shows that our proposed method having a computational speed of 98.6 GFLOPS and 61% efficiency based on the Roofline analysis that provides the theoretical computing speed of reaching 160 GLOPS with an average 2.225 operations/byte. To demonstrate the validity of the method, results from several benchmark problems covering both subsonic and supersonic flow regimes are presented. Performance testing using 96 GPU devices demonstrates a speed up of 89 times that of a single GPU (i.e. 92% efficiency) for a benchmark problem employing 48 million cells. Discussions regarding communication overhead and parallel efficiency for varying problem sizes are also presented.
ISSN:	1727-7191 1811-8216
DOI:	10.1017/jmech.2019.9