PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

Although pruning is an effective technique to reduce the number of weights in deep neural networks (DNNs), it remains challenging for the resulting sparse networks to perform low-latency inference on everyday hardware. This problem is mainly caused by the incompatibility between the unstructured spa...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computers Vol. 73; no. 11; pp. 2576 - 2589
Main Authors Geng, Hanfei, Liu, Yifei, Zheng, Yujie, Zhang, Li Lyna, Sun, Jingwei, Wang, Yujing, Wang, Yang, Sun, Guangzhong, Yang, Mao, Cao, Ting, Liu, Yunxin
Format Journal Article
LanguageEnglish
Published IEEE 01.11.2024
Subjects
Online AccessGet full text

Cover

Loading…