Exact Fused Dot Product Add Operators

This article explores architectures of exact (correctly rounded) fused dot product and add operators suitable for the FP32 and FP64 binary floating-point representations with subnormal support, and other representations with a wide dynamic range such as bfloat16. The exact summation of terms before...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 151 - 158
Main Authors	Desrentes, Oregane, de Dinechin, Benoit Dupont, de Dinechin, Florent
Format	Conference Proceeding
Language	English
Published	IEEE 04.09.2023
Subjects	BF16 Computer architecture Digital arithmetic dot product FP32 FP64 Heuristic algorithms High dynamic range Low latency communication three-term sum
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This article explores architectures of exact (correctly rounded) fused dot product and add operators suitable for the FP32 and FP64 binary floating-point representations with subnormal support, and other representations with a wide dynamic range such as bfloat16. The exact summation of terms before rounding requires a full-size accumulator, and this work discusses techniques to compress the identical bits of this accumulator. This requires the computation of the relative shift amounts of the terms, which is formulated as a parallel prefix algorithm, allowing for a low-latency implementation. Architectural options for the exact fused dot product and add operators with up to 16 products for FP32, FP64 and mixed-precision BF16 to FP32 are evaluated using the TSMC 16FFC technology node.
ISSN:	2576-2265
DOI:	10.1109/ARITH58626.2023.00016