Application Level Resource Scheduling for Deep Learning Acceleration on MPSoC

Deep Neutral Networks (DNNs) have been widely used in many applications, such as self-driving cars, natural language processing (NLP), image classification, visual object recognition, and so on. Field-programmable gate array (FPGA) based Multiprocessor System on a Chip (MPSoC) is recently considered...

Full description

Saved in:
Bibliographic Details
Published inJournal of signal processing systems Vol. 95; no. 10; pp. 1231 - 1243
Main Authors Gao, Cong, Saha, Sangeet, Zhu, Xuqi, Jing, Hongyuan, McDonald-Maier, Klaus D., Zhai, Xiaojun
Format Journal Article
LanguageEnglish
Published New York Springer US 01.10.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep Neutral Networks (DNNs) have been widely used in many applications, such as self-driving cars, natural language processing (NLP), image classification, visual object recognition, and so on. Field-programmable gate array (FPGA) based Multiprocessor System on a Chip (MPSoC) is recently considered one of the popular choices for deploying DNN models. However, the limited resource capacity of MPSoC imposes a challenge for such practical implementation. Recent studies revealed the trade-off between the “resources consumed" vs. the “performance achieved". Taking a cue from these findings, we address the problem of efficient implementation of deep learning into the resource-constrained MPSoC in this paper, where each deep learning network is run with different service levels based on resource usage (where a higher service level implies higher performance with increased resource consumption). To this end, we propose a heuristic-based strategy, Application Wise Level Selector (AWLS), for selecting service levels to maximize the overall performance subject to a given resource bound. AWLS can achieve higher performance within a constrained resource budget under various simulation scenarios. Further, we verify the proposed strategy using an AMD-Xilinx Zynq UltraScale+ XCZU9EG SoC. Using a framework designed to deploy multi-DNN on multi-DPUs (Deep Learning Units), it is proved that an optimal solution is achieved from the algorithm, which obtains the highest performance (Frames Per Second) using the same resource budget.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-023-01881-9