SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs
SU3_Bench explores performance portability across multiple programming models using a simple but nontrivial mathematical kernel. This kernel has been derived from the (LQCD) code used in applications such as Hadron Physics and hence should be of interest to the scientific community. SU3_Bench has a...
Saved in:
Published in | High Performance Computing pp. 65 - 84 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | SU3_Bench explores performance portability across multiple programming models using a simple but nontrivial mathematical kernel. This kernel has been derived from the (LQCD) code used in applications such as Hadron Physics and hence should be of interest to the scientific community.
SU3_Bench has a regular compute and data access pattern and on most traditional CPU and GPU-based systems, its performance is mainly determined by the achievable memory bandwidth. However, this paper shows that on the new Intel Programmable Integrated Unified Memory Architecture (PIUMA) that is designed for sparse workloads and has a balanced flops-to-byte ratio with scalar cores, SU3_Bench’s performance is determined by the total number of instructions that can be executed per cycle (pipeline throughput) rather than the usual bandwidth or flops. We show the performance analysis, porting, and optimizations of SU3_Bench on the PIUMA architecture and discuss how they are different from the standard NUMA CPUs (e.g., Xeon required NUMA optimizations whereas, on PIUMA, it was not necessary). We show iso-bandwidth and iso-power comparisons of SU3_Bench for PIUMA vs Xeon. We also show performance efficiency comparisons of SU3_Bench on PIUMA, Xeon, GPUs, and FPGAs based on pre-existing data. The lessons learned are generalizable to other similar kernels. |
---|---|
ISBN: | 3031073118 9783031073113 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-031-07312-0_4 |