PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning
Although pruning is an effective technique to reduce the number of weights in deep neural networks (DNNs), it remains challenging for the resulting sparse networks to perform low-latency inference on everyday hardware. This problem is mainly caused by the incompatibility between the unstructured spa...
Saved in:
Published in | IEEE transactions on computers Vol. 73; no. 11; pp. 2576 - 2589 |
---|---|
Main Authors | , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!