Weight Separation for Memory-Efficient and Accurate Deep Multitask Learning

We propose a new concept called Weight Separation of deep neural networks (DNNs), which enables memory-efficient and accurate deep multitask learning on a memory-constrained embedded system. The goal of weight separation is to achieve extreme packing of multiple heterogeneous DNNs into the limited m...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the IEEE International Conference on Pervasive Computing and Communications pp. 13 - 22
Main Authors	Lee, Seulki, Nirjon, Shahriar
Format	Conference Proceeding
Language	English
Published	IEEE 21.03.2022
Subjects	Deep learning deep neural networks embedded intelligence Embedded systems Ferroelectric films Memory management multitask learning Neural networks Nonvolatile memory on-device AI parameter sharing Pervasive computing
Online Access	Get full text
ISSN	2474-249X
DOI	10.1109/PerCom53586.2022.9762400

Cover

Loading…

More Information
Summary:	We propose a new concept called Weight Separation of deep neural networks (DNNs), which enables memory-efficient and accurate deep multitask learning on a memory-constrained embedded system. The goal of weight separation is to achieve extreme packing of multiple heterogeneous DNNs into the limited memory of the system while ensuring the prediction accuracy of the constituent DNNs at the same time. The proposed approach separates the DNN weights into two types of weight-pages consisting of a subset of weight parameters, i.e., shared and exclusive weight-pages. It optimally distributes the weight-pages into two levels of the system memory hierarchy and stores them separately, i.e., the shared weight-pages in primary (level-1) memory (e.g., RAM) and the exclusive weight-pages in secondary (level-2) memory (e.g., flask disk or SSD). First, to reduce the memory usage of multiple DNNs, less critical weight parameters are identified and overlapped onto the shared weight-pages that are deployed in the limited space of the primary (main) memory. Next, to retain the prediction accuracy of multiple DNNs, the essential weight parameters that play a critical role in preserving prediction accuracy are stored intact in the plentiful space of secondary memory storage in the form of exclusive weight-pages without overlapping. We implement two real systems applying the proposed weight separation: 1) a microcontroller-based multitask IoT system that performs multitask learning of 10 scaled-down DNNs by separating the weight parameters into FRAM and flash disk, and 2) an embedded GPU system that performs multitask learning of 10 state-of-the-art DNNs, separating the weight parameters into GPU RAM and eMMC. Our evaluation shows that memory efficiency, prediction accuracy, and execution time of deep multitask learning improve up to 5.9x, 2.0%, and 13.1x, respectively, without any modification of DNN models.
ISSN:	2474-249X
DOI:	10.1109/PerCom53586.2022.9762400