A Regression-based Model for End-to-End Latency Prediction for DNN Execution on GPUs

Deep neural networks (DNNs) have become increasingly popular in many domains as they reduce the requirement for human effort. However, today's DNN applications suffer from high computational complexity and sub-optimal device utilization. To solve this problem, researchers have been proposing ne...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 343 - 345
Main Authors Li, Ying, Sun, Yifan, Jog, Adwait
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2023
Subjects
Online AccessGet full text
DOI10.1109/ISPASS57527.2023.00047

Cover

Loading…
More Information
Summary:Deep neural networks (DNNs) have become increasingly popular in many domains as they reduce the requirement for human effort. However, today's DNN applications suffer from high computational complexity and sub-optimal device utilization. To solve this problem, researchers have been proposing new system design solutions, which require performance models to help them with pre-product concept validation. This paper discusses how to build a simple, yet accurate, performance model for DNNs on GPUs. Our observations demonstrate prevalent linear relationships between the GPU execution times and operation counts of DNNs layers. Our proposed linear-regression-based execution time predictor can make predictions with an error rate of 28%. 1 1 This material is based upon work supported in part by the Google Research Scholar Award and William & Mary. This work was performed in part using the computing facilities at William & Mary and Google Cloud. This work was done while Jog was with William & Mary. Jog is currently with the University of Virginia.
DOI:10.1109/ISPASS57527.2023.00047