Exact Fused Dot Product Add Operators

This article explores architectures of exact (correctly rounded) fused dot product and add operators suitable for the FP32 and FP64 binary floating-point representations with subnormal support, and other representations with a wide dynamic range such as bfloat16. The exact summation of terms before...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 151 - 158
Main Authors Desrentes, Oregane, de Dinechin, Benoit Dupont, de Dinechin, Florent
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This article explores architectures of exact (correctly rounded) fused dot product and add operators suitable for the FP32 and FP64 binary floating-point representations with subnormal support, and other representations with a wide dynamic range such as bfloat16. The exact summation of terms before rounding requires a full-size accumulator, and this work discusses techniques to compress the identical bits of this accumulator. This requires the computation of the relative shift amounts of the terms, which is formulated as a parallel prefix algorithm, allowing for a low-latency implementation. Architectural options for the exact fused dot product and add operators with up to 16 products for FP32, FP64 and mixed-precision BF16 to FP32 are evaluated using the TSMC 16FFC technology node.
ISSN:2576-2265
DOI:10.1109/ARITH58626.2023.00016