Enhanced Floating-Point Multiply-Add with Full Denormal Support
This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double p...
Saved in:
Published in | 2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 143 - 150 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
04.09.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs. |
---|---|
ISSN: | 2576-2265 |
DOI: | 10.1109/ARITH58626.2023.00015 |