Achieving high performance and portable parallel GMRES algorithm for compressible flow simulations on unstructured grids

Improving the effectiveness and scalability of implicit algorithms has long been a subject that attracted scientific computing researchers. The generalized minimal residual (GMRES) method is one of the efficient algorithms employed by Computational Fluid Dynamics (CFD). However, due to the inherent...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 79; no. 17; pp. 20116 - 20140
Main Authors Zhang, Jian, Deng, Liang, Li, Ruitian, Li, Ming, Liu, Jie, Dai, Zhe
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Improving the effectiveness and scalability of implicit algorithms has long been a subject that attracted scientific computing researchers. The generalized minimal residual (GMRES) method is one of the efficient algorithms employed by Computational Fluid Dynamics (CFD). However, due to the inherent sequential properties, GMRES encountered difficulties in achieving high parallel computing performance. Diverse HPC architecture trends also introduce challenges in algorithm migration. In this work, based on the separation of concerns thought, a performance-portable parallel GMRES algorithm is proposed to efficiently solve compressible Navier–Stokes equations on unstructured grids in parallel on different platforms. First, the Jacobian evaluation for the GMRES algorithm is improved. This method explicitly calculates a more accurate Jacobian matrix derived analytically instead of using the matrix-free method to enhance the convergence. In addition, it is convenient to call the highly optimized linear algebra libraries to achieve performance and portability, manually implementing the high-level Jacobian matrix computation and leaving the rest algorithm part and low-level optimization on target architecture to the library. Combined with a fine-grained parallel LU-SGS (lower-upper symmetric Gauss–Seidel) preconditioner, the algorithm can run efficiently on multi-core or many-core architectures such as GPUs. The proposed method has been used to compute some typical compressible flow configurations. Experimental results show that the proposed method has obvious advantages over the commonly used implicit algorithms like matrix-free GMRES and LU-SGS in terms of convergence and portability of parallel performance.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-023-05430-w