Quantized Sparse Weight Decomposition for Neural Network Compression

In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Kuzmin, Andrey, Mart van Baalen, Nagel, Markus, Behboodi, Arash
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 22.07.2022
Subjects	Mathematical analysis Model accuracy Neural networks Tensors Vector quantization
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!