Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product

Intel's Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 ma...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE 27th Symposium on Computer Arithmetic (ARITH) pp. 133 - 136
Main Authors	Hickmann, Brian, Chen, Jieasheng, Rotzin, Michael, Yang, Andrew, Urbanski, Maciej, Avancha, Sasikanth
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2020
Subjects	Adders Artificial neural networks deep learning floating point dot product Hardware Machine learning matrix multiplication Tensile stress tensor Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Intel's Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 matrix multiplication every 32 cycles, accumulating the result in single precision (FP32). To reduce hardware costs, the MPU uses a fused many-term floating point dot product design with block alignment of the many input terms during addition, resulting in a unique datapath with several interesting design trade-offs. In this paper, we describe the details of the MPU pipeline, discuss the trade-offs made in the design, and present information on the accuracy of the computation as compared to traditional FMA implementations.
ISSN:	2576-2265
DOI:	10.1109/ARITH48897.2020.00029