Enhanced Floating-Point Multiply-Add with Full Denormal Support

This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double p...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 143 - 150
Main Authors Sohn, Jongwook, Dean, David K., Quintana, Eric, Wong, Wing Shek
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs.
ISSN:2576-2265
DOI:10.1109/ARITH58626.2023.00015