Performance Portability Strategies for Grid C++ Expression Templates
One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C ++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offl...
Saved in:
Published in | EPJ Web of conferences Vol. 175; p. 9006 |
---|---|
Main Authors | , , , , , |
Format | Journal Article Conference Proceeding |
Language | English |
Published |
Les Ulis
EDP Sciences
01.01.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C
++
expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code. |
---|---|
Bibliography: | ObjectType-Conference Proceeding-1 SourceType-Conference Papers & Proceedings-1 content type line 21 |
ISSN: | 2100-014X 2101-6275 2100-014X |
DOI: | 10.1051/epjconf/201817509006 |