A fast GPU Monte Carlo implementation for radiative heat transfer in graded-index media

•A fast GPU Monte Carlo implementation for radiative heat transfer in graded-index media.•Optimizations for the performance of the Monte Carlo implementation based on the architecture of NVIDIA GPUs.•Significant speedups compared with the equivalent CPU implementations using single-core/multi-core.•...

Full description

Saved in:
Bibliographic Details
Published inJournal of quantitative spectroscopy & radiative transfer Vol. 269; p. 107680
Main Authors Shao, Jiang, Zhu, Keyong, Huang, Yong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.07.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A fast GPU Monte Carlo implementation for radiative heat transfer in graded-index media.•Optimizations for the performance of the Monte Carlo implementation based on the architecture of NVIDIA GPUs.•Significant speedups compared with the equivalent CPU implementations using single-core/multi-core.•The implementation has a high flexibility and can be easily modified to simulate radiative heat transfer in a variety of cases with different geometries, boundary conditions or media.•The optimization methods for the GPU implementations based on the architecture of NVIDIA GPUs can be used as references for other researchers when building GPU applications. Simulating radiative heat transfer in a graded-index (GRIN) medium is particularly challenging because of curve ray propagation trajectories. As an effective method, the Monte Carlo method is easy to implement with high precision. However, the Monte Carlo method is time consuming, and the computing time increased substantially when combined with the Runge-Kutta ray tracing technique to obtain the ray trajectories in the GRIN medium. Because the Monte Carlo method is ideally suited for parallel processing architectures and acceleration with graphics processing units (GPUs), we have developed a fast GPU Monte Carlo implementation for radiative heat transfer in GRIN media. The performance of the GPU implementation has been improved by combining the ray tracing process with the binary search and optimizing the code based on the architecture of GPUs. In particular, the utilization of the GPU hardware has been maximized, and the warp inactivity has been substantially reduced. Two- and three-dimensional GRIN medium models were evaluated to assess the accuracy and performance of the GPU implementations. Compared with the equivalent central processing unit (CPU) implementations, the GPU implementations provided in this paper show a great capability for producing physically accurate results with substantial speedups. The speedup of the GPU implementation on a single GPU for the two-dimensional case reaches 43.13 × against the equivalent CPU implementation using a single CPU core and 5.65 × against the equivalent CPU implementation using 6 CPU cores (12 threads). The speedup of the GPU implementation on a single GPU for the three-dimensional case reaches 35.61 × against the equivalent CPU implementation using a single CPU core and 2.07 × against the equivalent CPU implementation using 14 CPU cores (28 threads).
ISSN:0022-4073
1879-1352
DOI:10.1016/j.jqsrt.2021.107680