Enhanced Floating-Point Multiply-Add with Full Denormal Support

This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double p...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 143 - 150
Main Authors	Sohn, Jongwook, Dean, David K., Quintana, Eric, Wong, Wing Shek
Format	Conference Proceeding
Language	English
Published	IEEE 04.09.2023
Subjects	Delays Digital arithmetic Encoding floating-point arithmetic floating-point denormal numbers Floating-point multiply-add high-speed computer arithmetic Logic arrays Next generation networking Optimization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs.
ISSN:	2576-2265
DOI:	10.1109/ARITH58626.2023.00015