15+ years of joint parallel application performance analysis/tools training with Scalasca/Score-P and Paraver/Extrae toolsets
The diverse landscape of distributed heterogeneous computer systems currently available and being created to address computational challenges with the highest performance requirements presents daunting complexity for application developers. They must effectively decompose and distribute their applic...
Saved in:
Published in | Future generation computer systems Vol. 162; p. 107472 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The diverse landscape of distributed heterogeneous computer systems currently available and being created to address computational challenges with the highest performance requirements presents daunting complexity for application developers. They must effectively decompose and distribute their application functionality and data, efficiently orchestrating the associated communication and synchronisation, on multi/manycore CPU processors with multiple attached acceleration devices structured within compute nodes with interconnection networks of various topologies.
Sophisticated compilers, runtime systems and libraries are (loosely) matched with debugging, performance measurement and analysis tools, with proprietary versions by integrators/vendors provided exclusively for their systems complemented by portable (primarily) open-source equivalents developed and supported by the international research community over many years. The Scalasca and Paraver toolsets are two widely employed examples of the latter, installed on personal notebook computers through to the largest leadership HPC systems. Over more than fifteen years their developers have worked closely together in numerous collaborative projects culminating in the creation of a universal parallel performance assessment and optimisation methodology focused on application execution efficiency and scalability, and the associated training and coaching of application developers (often in teams) in its productive use, reviewed in this article with lessons learnt therefrom.
•Open-source portable parallel application performance measurement/analysis toolsets.•Quantification of parallel execution efficiency and scalability to extreme scale.•Joint hands-on training with performance tools for HPC application developers.•Demonstration of complementary capabilities of diverse performance tools.•Expert coaching to explore and investigate performance tuning opportunities. |
---|---|
ISSN: | 0167-739X |
DOI: | 10.1016/j.future.2024.07.050 |