Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 stan...

Full description

Saved in:

Bibliographic Details
Published in	Journal of parallel and distributed computing Vol. 195; p. 104977
Main Authors	McKevitt, James, Vorobyov, Eduard I., Kulikov, Igor
Format	Journal Article
Language	English
Published	Elsevier Inc 01.01.2025
Subjects	Coarray Fortran (CAF) CUDA Fortran MPI OpenMP 68N15 CUDA Fortran 85-08 OpenMP MPI Coarray Fortran (CAF) 68M14
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax. •Intel Coarray Fortran with Nvidia CUDA Fortran and OpenMP allows parallel computing without extensive code redesign.•Coarray Fortran offers comparable performance to the Message Passing Interface (MPI) for distributed memory parallelism.•This hybrid configuration shows near-linear scaling across different hardware setups.•CPU-GPU affinity can be achieved when using this hybrid method and affects performance.
ISSN:	0743-7315
DOI:	10.1016/j.jpdc.2024.104977