Flexible accelerator for sparse tensors in convolutional neural networks

An apparatus includes a tensor compute cluster having a plurality of tensor compute units to process a plurality of sub-feature maps in a machine learning application and a tensor memory cluster having a plurality of tensor feature map memory units to store the plurality of sub-feature maps. The app...

Full description

Saved in:
Bibliographic Details
Main Authors Kulkarni, Anand, Bandic, Zvonimir, Gunnam, Kiran
Format Patent
LanguageEnglish
Published 24.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:An apparatus includes a tensor compute cluster having a plurality of tensor compute units to process a plurality of sub-feature maps in a machine learning application and a tensor memory cluster having a plurality of tensor feature map memory units to store the plurality of sub-feature maps. The apparatus also includes circuitry to partition an input feature map into the plurality of sub-feature maps such that sparsity in each of the plurality of sub-feature maps satisfies a predetermined threshold, and assign each of the plurality of sub-feature maps to one of the plurality of tensor compute units and one of the plurality of tensor feature map memory units for processing in parallel.
Bibliography:Application Number: US202016830129