15+ years of joint parallel application performance analysis/tools training with Scalasca/Score-P and Paraver/Extrae toolsets

The diverse landscape of distributed heterogeneous computer systems currently available and being created to address computational challenges with the highest performance requirements presents daunting complexity for application developers. They must effectively decompose and distribute their applic...

Full description

Saved in:
Bibliographic Details
Published inFuture generation computer systems Vol. 162; p. 107472
Main Authors Wylie, Brian J.N., Giménez, Judit, Feld, Christian, Geimer, Markus, Llort, Germán, Mendez, Sandra, Mercadal, Estanislao, Visser, Anke, García-Gasulla, Marta
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The diverse landscape of distributed heterogeneous computer systems currently available and being created to address computational challenges with the highest performance requirements presents daunting complexity for application developers. They must effectively decompose and distribute their application functionality and data, efficiently orchestrating the associated communication and synchronisation, on multi/manycore CPU processors with multiple attached acceleration devices structured within compute nodes with interconnection networks of various topologies. Sophisticated compilers, runtime systems and libraries are (loosely) matched with debugging, performance measurement and analysis tools, with proprietary versions by integrators/vendors provided exclusively for their systems complemented by portable (primarily) open-source equivalents developed and supported by the international research community over many years. The Scalasca and Paraver toolsets are two widely employed examples of the latter, installed on personal notebook computers through to the largest leadership HPC systems. Over more than fifteen years their developers have worked closely together in numerous collaborative projects culminating in the creation of a universal parallel performance assessment and optimisation methodology focused on application execution efficiency and scalability, and the associated training and coaching of application developers (often in teams) in its productive use, reviewed in this article with lessons learnt therefrom. •Open-source portable parallel application performance measurement/analysis toolsets.•Quantification of parallel execution efficiency and scalability to extreme scale.•Joint hands-on training with performance tools for HPC application developers.•Demonstration of complementary capabilities of diverse performance tools.•Expert coaching to explore and investigate performance tuning opportunities.
ISSN:0167-739X
DOI:10.1016/j.future.2024.07.050