DRL-based Multi-Stream Scheduling of Inference Pipelines on Edge Devices
Real-time scheduling of multiple neural network-based inference pipelines on Graphics Processing Unit (GPU) based edge devices is an active area of research nowadays. Applications like Advanced Driver-Assistance Systems (ADAS) execute multiple such inference pipelines to make informed decisions on d...
Saved in:
Published in | 2024 37th International Conference on VLSI Design and 2024 23rd International Conference on Embedded Systems (VLSID) pp. 324 - 329 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
06.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Real-time scheduling of multiple neural network-based inference pipelines on Graphics Processing Unit (GPU) based edge devices is an active area of research nowadays. Applications like Advanced Driver-Assistance Systems (ADAS) execute multiple such inference pipelines to make informed decisions on driving scenarios. The real-time performance of ADAS is often limited by platform resource limitations and thus incurs execution latency which ultimately leads to deadline violation. In this regard, modern GPUs provide support for concurrent execution of multiple compute streams. However, there is a lack of scheduling strategies in the literature that consider multiple such compute streams and focus on the concurrent execution of inference pipelines for more efficient real-time scheduling. In this paper, we address this issue by proposing a Deep Reinforcement Learning (DRL) based solution for multi-stream scheduling of inference pipelines on edge GPUs. Using DRL, we learn how to map every layer of the target inference pipelines to high or low-priority streams, while satisfying task-level deadline requirements. The experimental evaluation shows the efficacy of the proposed approach as compared to some baseline approaches. |
---|---|
ISSN: | 2380-6923 |
DOI: | 10.1109/VLSID60093.2024.00060 |