PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

Although pruning is an effective technique to reduce the number of weights in deep neural networks (DNNs), it remains challenging for the resulting sparse networks to perform low-latency inference on everyday hardware. This problem is mainly caused by the incompatibility between the unstructured spa...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 73; no. 11; pp. 2576 - 2589
Main Authors	Geng, Hanfei, Liu, Yifei, Zheng, Yujie, Zhang, Li Lyna, Sun, Jingwei, Wang, Yujing, Wang, Yang, Sun, Guangzhong, Yang, Mao, Cao, Ting, Liu, Yunxin
Format	Journal Article
Language	English
Published	IEEE 01.11.2024
Subjects	Accuracy Artificial neural networks block pruning Deep neural network Hardware Optimization Shape sparse kernels Sparse matrices Sun
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!