Low latency matrix multiply unit

Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply un...

Full description

Saved in:
Bibliographic Details
Main Authors Jouppi, Norman Paul, Phelps, Andrew Everett
Format Patent
LanguageEnglish
Published 21.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.
Bibliography:Application Number: US202217985069