Power- and Time-Aware Deep Learning Inference for Mobile Embedded Devices

Deep learning is a state-of-the-art approach that provides highly accurate inference for many cyber-physical systems (CPS) such as autonomous cars and robots. Deep learning inference often needs to be performed locally on mobile and embedded devices, rather than in the cloud, to address concerns suc...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 7; pp. 3778 - 3789
Main Authors	Kang, Woochul, Chung, Jaeyong
Format	Journal Article
Language	English
Published	Piscataway IEEE 2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Autonomous cars Computational modeling Control methods Cyber-physical systems Deep learning DVFS Electronic devices Embedded systems feedback control Inference low power MIMO (control systems) Network latency Performance evaluation Power consumption Power demand Power management power-awareness Program processors QoS Quality of service Quality of service architectures real-time Task analysis
Online Access	Get full text
ISSN	2169-3536 2169-3536
DOI	10.1109/ACCESS.2018.2887099

Cover

Loading…

More Information
Summary:	Deep learning is a state-of-the-art approach that provides highly accurate inference for many cyber-physical systems (CPS) such as autonomous cars and robots. Deep learning inference often needs to be performed locally on mobile and embedded devices, rather than in the cloud, to address concerns such as latency, power consumption, and limited bandwidth. However, existing approaches have focused on delivering "best-effort" performance to resource-constrained mobile embedded devices, resulting in unpredictable performance under highly variable environments of CPS. In this paper, we propose a novel deep learning inference runtime, called DeepRT, that supports multiple QoS objectives simultaneously against unpredictable workloads. In DeepRT, the multiple inputs/multiple outputs (MIMO) modeling and control methodology is proposed as a primary tool to support multiple QoS goals including the inference latency and power consumption. DeepRT's MIMO controller coordinates multiple computing resources, such as CPUs and GPUs, by capturing their close interactions and effects on multiple QoS objectives. We demonstrate the viability of DeepRT's QoS management architecture by implementing a prototype of DeepRT. The evaluation results demonstrate that, compared with baseline approaches, DeepRT can support the desired inference latency as well as power consumption for various deep learning models in a highly robust manner.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2018.2887099