ScanTAP: Balancing Throughput, Accuracy and Power consumption for Concurrent DNN Execution on Heterogeneous Multi-Accelerator Edge Platforms

Modern edge System-on-Chips (SoCs), integrated with diverse heterogeneous accelerators sharing system memory, are increasingly used for deploying deep neural network (DNN) inference tasks. First order metrics such as throughput, power consumption, and accuracy can vary significantly with the choice...

Full description

Saved in:

Bibliographic Details
Published in	IEEE International Conference on Edge Computing (Online) pp. 64 - 74
Main Authors	Shende, Omkar B, Ananthanarayanan, Gayathri
Format	Conference Proceeding
Language	English
Published	IEEE 07.07.2025
Subjects	Accuracy Artificial neural networks Concurrent DNN execution Graphics processing units Heterogeneous accelerators Measurement Power demand Predictive models Quantization (signal) Runtime Throughput Trade-offs Traffic control
Online Access	Get full text
ISSN	2767-9918
DOI	10.1109/EDGE67623.2025.00016

Cover

More Information
Summary:	Modern edge System-on-Chips (SoCs), integrated with diverse heterogeneous accelerators sharing system memory, are increasingly used for deploying deep neural network (DNN) inference tasks. First order metrics such as throughput, power consumption, and accuracy can vary significantly with the choice of accelerator and quantization precision. Applications such as smart traffic, surveillance, smart homes and autonomous vehicles often require concurrent execution of multiple DNNs on a single device, making it essential to understand the interplay of multiple DNNs on multi-accelerator hardware and its impact on these primary metrics. In this work, we propose ScanTAP, a unified framework that: (a) leverages the static intrinsic features of DNN models, such as layer types, number of floating-point and multiply-accumulate operations, along with their runtime characteristics on the hardware accelerator such as memory and bus access patterns to build a regression model that estimates the degradation in the performance of the multiple DNNs competing for the system resources and the resulting co-execution throughput (b) employs a gradient boosting based prediction model to estimate the resultant co-execution SoC power consumption and (c) allows end-users to specify the relative importance of the primary metrics according to their application needs and (d) creates the mapping schedule using the proposed ScanMap algorithm which identifies the best accelerator and quantization precision combination for concurrent DNN execution while adhering to user-defined importance of the primary metrics. Our experimental evaluations, conducted with various workload mixes of 10 standard DNNs on a real hardware platform featuring the RK3588S SoC integrated with GPU, NPU, and a Movidius VPU, demonstrate that ScanTAP incurs an average prediction error of 7.7% compared to a brute-force solution.
ISSN:	2767-9918
DOI:	10.1109/EDGE67623.2025.00016