Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product

Intel's Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 ma...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE 27th Symposium on Computer Arithmetic (ARITH) pp. 133 - 136
Main Authors Hickmann, Brian, Chen, Jieasheng, Rotzin, Michael, Yang, Andrew, Urbanski, Maciej, Avancha, Sasikanth
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Intel's Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 matrix multiplication every 32 cycles, accumulating the result in single precision (FP32). To reduce hardware costs, the MPU uses a fused many-term floating point dot product design with block alignment of the many input terms during addition, resulting in a unique datapath with several interesting design trade-offs. In this paper, we describe the details of the MPU pipeline, discuss the trade-offs made in the design, and present information on the accuracy of the computation as compared to traditional FMA implementations.
ISSN:2576-2265
DOI:10.1109/ARITH48897.2020.00029