Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme

In recent years, graphics processing units (GPUs) have emerged as a low-cost, low-power and a very high performance alternative to conventional central processing units (CPUs). The latest GPUs offer a speedup of two-to-three orders of magnitude over CPU for various science and engineering applicatio...

Full description

Saved in:
Bibliographic Details
Published inComputers & geosciences Vol. 52; pp. 292 - 299
Main Authors Mielikainen, Jarno, Huang, Bormin, Wang, Jun, Allen Huang, H.-L., Goldberg, Mitchell D.
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.03.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years, graphics processing units (GPUs) have emerged as a low-cost, low-power and a very high performance alternative to conventional central processing units (CPUs). The latest GPUs offer a speedup of two-to-three orders of magnitude over CPU for various science and engineering applications. The Weather Research and Forecasting (WRF) model is the latest-generation numerical weather prediction model. It has been designed to serve both operational forecasting and atmospheric research needs. It proves useful for a broad spectrum of applications for domain scales ranging from meters to hundreds of kilometers. WRF computes an approximate solution to the differential equations which govern the air motion of the whole atmosphere. Kessler microphysics module in WRF is a simple warm cloud scheme that includes water vapor, cloud water and rain. Microphysics processes which are modeled are rain production, fall and evaporation. The accretion and auto-conversion of cloud water processes are also included along with the production of cloud water from condensation. In this paper, we develop an efficient WRF Kessler microphysics scheme which runs on Graphics Processing Units (GPUs) using the NVIDIA Compute Unified Device Architecture (CUDA). The GPU-based implementation of Kessler microphysics scheme achieves a significant speedup of 70× over its CPU based single-threaded counterpart. When a 4 GPU system is used, we achieve an overall speedup of 132× as compared to the single thread CPU version. ► We accelerate WRF with a NVIDIA GPU. ► The corresponding speedup is 70×. ► Multi-GPU version of WRF are implemented. ► The speedup with 4 GPUs is 132×. ► Three main optimization steps of GPU program are introduced.
Bibliography:http://dx.doi.org/10.1016/j.cageo.2012.10.006
ISSN:0098-3004
1873-7803
DOI:10.1016/j.cageo.2012.10.006