Analyzing GPU Energy Consumption in Data Movement and Storage

GPUs are the prevailing solution to execute high-performance tasks (e.g., machine learning training). As the peak performance of modern GPUs increases with each generation, so does their thermal design power (TDP). Hence, identifying energy bottlenecks in the GPU architecture is crucial to designing...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Application-Specific Systems, Architectures, and Processors pp. 143 - 151
Main Authors Delestrac, Paul, Miquel, Jonathan, Bhattacharjee, Debjyoti, Moolchandani, Diksha, Catthoor, Francky, Torres, Lionel, Novo, David
Format Conference Proceeding
LanguageEnglish
Published IEEE 24.07.2024
Subjects
Online AccessGet full text
ISSN2160-052X
DOI10.1109/ASAP61560.2024.00038

Cover

Loading…
More Information
Summary:GPUs are the prevailing solution to execute high-performance tasks (e.g., machine learning training). As the peak performance of modern GPUs increases with each generation, so does their thermal design power (TDP). Hence, identifying energy bottlenecks in the GPU architecture is crucial to designing more efficient architectures in the future. However, due to the complex proprietary nature of modern GPU architectures, providing a detailed breakdown of the GPU energy consumption is not trivial. The goal of this work is to estimate a lower bound for the energy consumed by data movement and storage in modern GPU architectures, leveraging internal power sensors. We establish a basic energy model for modern GPUs, focused on data movement to/from the hardware-managed caches and software-managed memories. We propose a methodology to calibrate the energy model using microbenchmarks, performance counters, and the internal power sensor. We experimentally calibrate the model on an A100 NVIDIA GPU. Then, we challenge the consistency of the results by cross-validating with modified microbenchmarks with additional instructions. Finally, we use the calibrated energy model to evaluate breakdowns for workloads of increasing complexity (e.g., a ResNet-50 training iteration with different software optimizations). Our results show that data movement dominates the dynamic energy consumption of the GPU (up to 84%), with DRAM accesses being the main contributor.
ISSN:2160-052X
DOI:10.1109/ASAP61560.2024.00038