Starlight: A kernel optimizer for GPU processing

Over the past few years, GPUs have found widespread adoption in many scientific domains, offering notable performance and energy efficiency advantages compared to CPUs. However, optimizing GPU high-performance kernels poses challenges given the complexities of GPU architectures and programming model...

Full description

Saved in:

Bibliographic Details
Published in	Journal of parallel and distributed computing Vol. 187; p. 104832
Main Authors	Zeni, Alberto, Del Sozzo, Emanuele, D'Arnese, Eleonora, Conficconi, Davide, Santambrogio, Marco D.
Format	Journal Article
Language	English
Published	Elsevier Inc 01.05.2024
Subjects	GPU High performance computing Performance analysis Performance optimization Roofline model Performance optimization Roofline model Performance analysis High performance computing GPU
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Over the past few years, GPUs have found widespread adoption in many scientific domains, offering notable performance and energy efficiency advantages compared to CPUs. However, optimizing GPU high-performance kernels poses challenges given the complexities of GPU architectures and programming models. Moreover, current GPU development tools provide few high-level suggestions and overlook the underlying hardware. Here we present Starlight, an open-source, highly flexible tool for enhancing GPU kernel analysis and optimization. Starlight autonomously describes Roofline Models, examines performance metrics, and correlates these insights with GPU architectural bottlenecks. Additionally, Starlight predicts potential performance enhancements before altering the source code. We demonstrate its efficacy by applying it to literature genomics and physics applications, attaining speedups from 1.1× to 2.5× over state-of-the-art baselines. Furthermore, Starlight supports the development of new GPU kernels, which we exemplify through an image processing application, showing speedups of 12.7× and 140× when compared against state-of-the-art FPGA- and GPU-based solutions. •We enrich the incomplete information provided by NVIDIA profilers.•Starlight can support the development of an application from the ground up.•Starlight predicts potential performance enhancements before altering the source code.•Automatic Roofline Model generation for any CUDA-capable GPU.•A qualitative overview of the various state-of-the-art solutions for GPU kernel optimization and Roofline Model generation.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2023.104832