PER-EMBEDDING-GROUP ACTIVATION QUANTIZATION

A processor-implemented method for providing per-embedding-group activation quantization includes receiving sequential data at a first layer of a transformer neural network. The sequential data is processed via the first layer of the transformer neural network to generate an activation tensor. The a...

Full description

Saved in:

Bibliographic Details
Main Authors	NAGEL, Markus, BLANKEVOORT, Tijmen Pieter Frederik, BONDARENKO, Yelysei
Format	Patent
Language	English French German
Published	04.09.2024
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A processor-implemented method for providing per-embedding-group activation quantization includes receiving sequential data at a first layer of a transformer neural network. The sequential data is processed via the first layer of the transformer neural network to generate an activation tensor. The activation tensor is split into multiple groups of embeddings. Each of the embeddings groups has a different set of quantization parameters. Each of the embedding groups is quantized separately based on the corresponding quantization parameters of the different set of quantization parameters. The quantized embedding groups are multiplied with a set of weights to generate an output.
Bibliography:	Application Number: EP20220813818