Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product
Intel's Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 ma...
Saved in:
Published in | 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH) pp. 133 - 136 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Intel's Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 matrix multiplication every 32 cycles, accumulating the result in single precision (FP32). To reduce hardware costs, the MPU uses a fused many-term floating point dot product design with block alignment of the many input terms during addition, resulting in a unique datapath with several interesting design trade-offs. In this paper, we describe the details of the MPU pipeline, discuss the trade-offs made in the design, and present information on the accuracy of the computation as compared to traditional FMA implementations. |
---|---|
ISSN: | 2576-2265 |
DOI: | 10.1109/ARITH48897.2020.00029 |