lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods

•In this article we present a meta-programming system for lattice Boltzmann methods similar to what FEniCS is for finite element methods.•Our tool can automatically generate highly efficient, MPI parallel CPU and GPU code for a wide range of lattice Boltzmann algorithms including multi-relaxation-ti...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computational science Vol. 49; p. 101269
Main Authors	Bauer, Martin, Köstler, Harald, Rüde, Ulrich
Format	Journal Article
Language	English
Published	Elsevier B.V 01.02.2021
Subjects	hpc LBM Meta programming LBM hpc Meta programming
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•In this article we present a meta-programming system for lattice Boltzmann methods similar to what FEniCS is for finite element methods.•Our tool can automatically generate highly efficient, MPI parallel CPU and GPU code for a wide range of lattice Boltzmann algorithms including multi-relaxation-time, cumulant, and entropic schemes. The generated compute kernels obtain excellent performance and scaling results, demonstrated on the SuperMUC-NG supercomputing system. Lattice Boltzmann methods are a popular mesoscopic alternative to classical computational fluid dynamics based on the macroscopic equations of continuum mechanics. Many variants of lattice Boltzmann methods have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate multi-phase, multi-component, turbulent, and non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of different lattice Boltzmann methods. Additionally, lbmpy provides a generic development environment for new schemes. A high-level domain-specific language allows the user to formulate, extend and test various lattice Boltzmann methods. In all cases, the lattice Boltzmann method can be specified in symbolic form. Transformations that operate on this symbolic representation yield highly efficient compute kernels. This is achieved by automatically parallelizing the methods, and by various application-specific automatized steps that optimize the resulting code. This pipeline of transformations can be applied to a wide range of lattice Boltzmann variants, including single- and two-relaxation-time schemes, multi-relaxation-time methods, as well as the more advanced cumulant methods, and entropically stabilized methods. lbmpy can be integrated into high-performance computing frameworks to enable massively parallel, distributed simulations. This is demonstrated using the waLBerla multiphysics package to conduct scaling experiments on the SuperMUC-NG supercomputing system on up to 147 456 compute cores.
ISSN:	1877-7503 1877-7511
DOI:	10.1016/j.jocs.2020.101269