Achieving high performance and portable parallel GMRES algorithm for compressible flow simulations on unstructured grids
Improving the effectiveness and scalability of implicit algorithms has long been a subject that attracted scientific computing researchers. The generalized minimal residual (GMRES) method is one of the efficient algorithms employed by Computational Fluid Dynamics (CFD). However, due to the inherent...
Saved in:
Published in | The Journal of supercomputing Vol. 79; no. 17; pp. 20116 - 20140 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.11.2023
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Improving the effectiveness and scalability of implicit algorithms has long been a subject that attracted scientific computing researchers. The generalized minimal residual (GMRES) method is one of the efficient algorithms employed by Computational Fluid Dynamics (CFD). However, due to the inherent sequential properties, GMRES encountered difficulties in achieving high parallel computing performance. Diverse HPC architecture trends also introduce challenges in algorithm migration. In this work, based on the separation of concerns thought, a performance-portable parallel GMRES algorithm is proposed to efficiently solve compressible Navier–Stokes equations on unstructured grids in parallel on different platforms. First, the Jacobian evaluation for the GMRES algorithm is improved. This method explicitly calculates a more accurate Jacobian matrix derived analytically instead of using the matrix-free method to enhance the convergence. In addition, it is convenient to call the highly optimized linear algebra libraries to achieve performance and portability, manually implementing the high-level Jacobian matrix computation and leaving the rest algorithm part and low-level optimization on target architecture to the library. Combined with a fine-grained parallel LU-SGS (lower-upper symmetric Gauss–Seidel) preconditioner, the algorithm can run efficiently on multi-core or many-core architectures such as GPUs. The proposed method has been used to compute some typical compressible flow configurations. Experimental results show that the proposed method has obvious advantages over the commonly used implicit algorithms like matrix-free GMRES and LU-SGS in terms of convergence and portability of parallel performance. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-023-05430-w |