Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI

We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the informat...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Xu, Ao, Bo-Tao, Li
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 17.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the information between multiple GPUs. With a single GPU, the two-dimension (2D) simulation achieved 1.93 billion lattice updates per second (GLUPS) with a grid number of \(8193^{2}\), and the three-dimension (3D) simulation achieved 1.04 GLUPS with a grid number of \(385^{3}\), which is more than 76% of the theoretical maximum performance. On multi-GPUs, we adopt block partitioning, overlapping communications with computations, and concurrent computation to optimize parallel efficiency. We show that in the strong scaling test, using 16 GPUs, the 2D simulation achieved 30.42 GLUPS and the 3D simulation achieved 14.52 GLUPS. In the weak scaling test, the parallel efficiency remains above 99% up to 16 GPUs. Our results demonstrated that, with improved data and task management, the hybrid OpenACC and MPI technique is promising for thermal LB simulation on multi-GPUs.
ISSN:2331-8422
DOI:10.48550/arxiv.2211.03160