NEURAL NETWORK COMPRESSION
A neural network model is trained, where the training includes multiple training iterations. Weights of a particular layer of the neural network are pruned during a forward pass of a particular one of the training iterations. During the same forward pass of the particular training iteration, values...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
03.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A neural network model is trained, where the training includes multiple training iterations. Weights of a particular layer of the neural network are pruned during a forward pass of a particular one of the training iterations. During the same forward pass of the particular training iteration, values of weights of the particular layer are quantized to determine a quantized-sparsified subset of weights for the particular layer. A compressed version of the neural network model is generated from the training based at least in part on the quantized-sparsified subset of weights. |
---|---|
Bibliography: | Application Number: US201917416461 |