Quantized Sparse Weight Decomposition for Neural Network Compression
In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
22.07.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!