ScanTAP: Balancing Throughput, Accuracy and Power consumption for Concurrent DNN Execution on Heterogeneous Multi-Accelerator Edge Platforms
Modern edge System-on-Chips (SoCs), integrated with diverse heterogeneous accelerators sharing system memory, are increasingly used for deploying deep neural network (DNN) inference tasks. First order metrics such as throughput, power consumption, and accuracy can vary significantly with the choice...
Saved in:
Published in | IEEE International Conference on Edge Computing (Online) pp. 64 - 74 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
07.07.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2767-9918 |
DOI | 10.1109/EDGE67623.2025.00016 |
Cover
Summary: | Modern edge System-on-Chips (SoCs), integrated with diverse heterogeneous accelerators sharing system memory, are increasingly used for deploying deep neural network (DNN) inference tasks. First order metrics such as throughput, power consumption, and accuracy can vary significantly with the choice of accelerator and quantization precision. Applications such as smart traffic, surveillance, smart homes and autonomous vehicles often require concurrent execution of multiple DNNs on a single device, making it essential to understand the interplay of multiple DNNs on multi-accelerator hardware and its impact on these primary metrics. In this work, we propose ScanTAP, a unified framework that: (a) leverages the static intrinsic features of DNN models, such as layer types, number of floating-point and multiply-accumulate operations, along with their runtime characteristics on the hardware accelerator such as memory and bus access patterns to build a regression model that estimates the degradation in the performance of the multiple DNNs competing for the system resources and the resulting co-execution throughput (b) employs a gradient boosting based prediction model to estimate the resultant co-execution SoC power consumption and (c) allows end-users to specify the relative importance of the primary metrics according to their application needs and (d) creates the mapping schedule using the proposed ScanMap algorithm which identifies the best accelerator and quantization precision combination for concurrent DNN execution while adhering to user-defined importance of the primary metrics. Our experimental evaluations, conducted with various workload mixes of 10 standard DNNs on a real hardware platform featuring the RK3588S SoC integrated with GPU, NPU, and a Movidius VPU, demonstrate that ScanTAP incurs an average prediction error of 7.7% compared to a brute-force solution. |
---|---|
ISSN: | 2767-9918 |
DOI: | 10.1109/EDGE67623.2025.00016 |