Fully Dynamic Inference With Deep Neural Networks

Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction. Their cross-domain success, however, is often achieved at the expense of computational cost, high memory bandwidth, and long inference latency, which prevents...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on emerging topics in computing Vol. 10; no. 2; pp. 962 - 972
Main Authors	Xia, Wenhan, Yin, Hongxu, Dai, Xiaoliang, Jha, Niraj K.
Format	Journal Article
Language	English
Published	New York IEEE 01.04.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Adaptation models Artificial neural networks Autonomous cars Channels Computational efficiency Computational modeling Computer architecture Computing costs Conditional computation Datasets deep learning dynamic execution dynamic inference Floating point arithmetic Inference model compression Network latency Neural networks Neurons Quantization (signal) Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction. Their cross-domain success, however, is often achieved at the expense of computational cost, high memory bandwidth, and long inference latency, which prevents their deployment in resource-constrained and time-sensitive scenarios, such as edge-side inference and self-driving cars. While recently developed methods for creating efficient deep neural networks are making their real-world deployment more feasible by reducing model size, they do not fully exploit input properties on a per-instance basis to maximize computational efficiency and task accuracy. In particular, most existing methods typically use a one-size-fits-all approach that identically processes all inputs. Motivated by the fact that different images require different feature embeddings to be accurately classified, we propose a fully dynamic paradigm that imparts deep convolutional neural networks with hierarchical inference dynamics at the level of layers and individual convolutional filters/channels. Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped. L-Net and C-Net also learn how to scale retained computation outputs to maximize task accuracy. By integrating L-Net and C-Net into a joint design framework, called LC-Net, we consistently outperform state-of-the-art dynamic frameworks with respect to both efficiency and classification accuracy. On the CIFAR-10 dataset, LC-Net results in up to <inline-formula><tex-math notation="LaTeX">11.9\times</tex-math> <mml:math><mml:mrow><mml:mn>11</mml:mn><mml:mo>.</mml:mo><mml:mn>9</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="xia-ieq1-3056031.gif"/> </inline-formula> fewer floating-point operations (FLOPs) and up to 3.3 percent higher accuracy compared to other dynamic inference methods. On the ImageNet dataset, LC-Net achieves up to <inline-formula><tex-math notation="LaTeX">1.4\times</tex-math> <mml:math><mml:mrow><mml:mn>1</mml:mn><mml:mo>.</mml:mo><mml:mn>4</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="xia-ieq2-3056031.gif"/> </inline-formula> fewer FLOPs and up to 4.6 percent higher Top-1 accuracy than the other methods.
ISSN:	2168-6750 2168-6750
DOI:	10.1109/TETC.2021.3056031