Performance and Power Analysis of High-Density Multi-GPGPU Architectures: A Preliminary Case Study

A system architecture with high-density general purpose graphic processing unit (GPGPU) is emerging as a promising solution that can offer high compute performance and performance-per-watt for building cluster supercomputers. The raw compute power of these heterogeneous systems greatly exceeds the c...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems pp. 66 - 71
Main Authors Yuxiang Gao, Iqbal, Saeed, Peng Zhang, Meikang Qiu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A system architecture with high-density general purpose graphic processing unit (GPGPU) is emerging as a promising solution that can offer high compute performance and performance-per-watt for building cluster supercomputers. The raw compute power of these heterogeneous systems greatly exceeds the current prevailing homogenous systems, motivating their rapid adoption. These heterogeneous systems do however increase the complexity of developing parallel applications and there is a need to investigate the compute performances and associated power consumption of common benchmarks and scientific computing applications. In this paper, we present the performance and power studies through using the Dell C4130 server that integrates up to 4 GPGPU cards and NVIDIA GPGPU K80 is used. The high performance Linpack (HPL) and molecular dynamics (MD) simulators including NAMD, LAMMPS and GROMACS are tested. Through comparing 4-K80 and 2-Xeon E5-2690 v3 systems, we show that: (1) for HPL tests, the 4- GPU server delivers up to 7 TFLOPS that is 9 times faster than the 2-CPU system and its power efficiency is 4 GFLOPS per Watt, (2) for MD tests, NAMD on 4-GPU server achieves 7.8 times speedup and it uses 2.3 times power consumption compared to 2-CPU system, and LAMMPS achieves 16 times speedup and it uses 2.6 times power consumption, and GROMACS achieves 3.3 times speed up and it uses 2.6 times power consumption. These preliminary results demonstrated that the novel high-density multi-GPGPU architecture offers high performances for computing intensive applications and molecular simulators with superior power efficiencies in a space efficient design. In future, such heterogeneous architecture could be a powerful alternative solution for next generation supercomputer systems.
DOI:10.1109/HPCC-CSS-ICESS.2015.68