Enhanced Floating-Point Multiply-Add with Full Denormal Support

This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double p...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 143 - 150
Main Authors	Sohn, Jongwook, Dean, David K., Quintana, Eric, Wong, Wing Shek
Format	Conference Proceeding
Language	English
Published	IEEE 04.09.2023
Subjects	Delays Digital arithmetic Encoding floating-point arithmetic floating-point denormal numbers Floating-point multiply-add high-speed computer arithmetic Logic arrays Next generation networking Optimization
Online Access	Get full text

Cover

Loading…

Abstract	This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs.
AbstractList	This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs.
Author	Sohn, Jongwook Quintana, Eric Dean, David K. Wong, Wing Shek
Author_xml	– sequence: 1 givenname: Jongwook surname: Sohn fullname: Sohn, Jongwook email: jongwook.sohn@intel.com organization: Intel Corporation,Austin,TX,USA – sequence: 2 givenname: David K. surname: Dean fullname: Dean, David K. email: david.k.dean@intel.com organization: Intel Corporation,Austin,TX,USA – sequence: 3 givenname: Eric surname: Quintana fullname: Quintana, Eric email: eric.quintana@intel.com organization: Intel Corporation,Austin,TX,USA – sequence: 4 givenname: Wing Shek surname: Wong fullname: Wong, Wing Shek email: wing.shek.wong@intel.com organization: Intel Corporation,Austin,TX,USA
BookMark	eNotzM1Kw0AUQOFRFKw1b6CQF0i8c-d_JaE2tlBRtK7LJJnYgekkJCmlby-iq7P5OLfkKnbREfJAIacUzGPxsd6uhJYocwRkOQBQcUESo4xmAhg1iPySzFAomSFKcUOScfQVcOTSKKln5GkZ9zbWrknL0NnJx-_svfNxSl-PYfJ9OGdF06QnP-3T8hhC-uxiNxxsSD-Pfd8N0x25bm0YXfLfOfkql9vFKtu8vawXxSbzCHzKpKS1spwqUBW0yllWGeUYStdwtFRrWVcOjbWGCtMCl1RYQVHVumod8IbNyf3f1zvndv3gD3Y47-iv5FqwHzSjTD0
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ARITH58626.2023.00015
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350319224
EISSN	2576-2265
EndPage	150
ExternalDocumentID	10461485
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i204t-661c7a41707b0f7ea3b97e326ed42a1886cbe29aa9159f04615a5127c8bfe04d3
IEDL.DBID	RIE
IngestDate	Wed Jun 26 19:43:02 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-661c7a41707b0f7ea3b97e326ed42a1886cbe29aa9159f04615a5127c8bfe04d3
PageCount	8
ParticipantIDs	ieee_primary_10461485
PublicationCentury	2000
PublicationDate	2023-Sept.-4
PublicationDateYYYYMMDD	2023-09-04
PublicationDate_xml	– month: 09 year: 2023 text: 2023-Sept.-4 day: 04
PublicationDecade	2020
PublicationTitle	2023 IEEE 30th Symposium on Computer Arithmetic (ARITH)
PublicationTitleAbbrev	ARITH
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib042469768 ssib054984879
Score	2.279909
Snippet	This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many...
SourceID	ieee
SourceType	Publisher
StartPage	143
SubjectTerms	Delays Digital arithmetic Encoding floating-point arithmetic floating-point denormal numbers Floating-point multiply-add high-speed computer arithmetic Logic arrays Next generation networking Optimization
Title	Enhanced Floating-Point Multiply-Add with Full Denormal Support
URI	https://ieeexplore.ieee.org/document/10461485
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA3akycVK36Tg9dss2my2ZykaEsVWoq00FvJxywWy67I9qC_3iTb-gWCt5DThATem8x7MwhdS6EAjDMks2lBuDCamNAIUlLn-T14yIzV89E4G874w1zMN2b16IUBgCg-gyQsYy3fVXYdvso6oR7p6bvYRbtSqcastX08nPlE7xt39nlP7sm42rh2Uqo6vcf76VAECp-EoeFJ9BH_mKoSQWWwj8bbcBotyXOyrk1i3391avx3vAeo_eXfw5NPZDpEO1AeoZt--RTL_XiwqnRQO5NJtSxrPGo0hW-k5xwO_7I45KX4DsrAZ1c4DP70JL2NZoP-9HZINuMTyJJRXhOPvFZqnkoqDS0k6K5REjxdA8eZTvM8swaY0lp5SlOEUIX28C9tbgqg3HWPUausSjhBuGBOZM5waoTkyoBKOYMuFE65TDNtT1E7nH7x0nTIWGwPfvbH_jnaCzcQtVr8ArXq1zVcenCvzVW81A9gUKI7
link.rule.ids	310,311,783,787,792,793,799,27937,55086
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1NSwMxEIaD1oOeVKz4bQ5es81uk83mJEUtW21LkRZ6K8lmFotlV2R70F9vkm39AsFbyGlCAvNMZt4ZhK4ElwDaaBJnYU4Y14po1whSUGP5HqzL9NnzwTBOJ-x-yqcrsbrXwgCALz6DwC19Lt-U2dJ9lbVcPtLiO99EWxask7iWa62fD4tsqPeNnm3kk1gclyvdTkhlq_PYG6fcQXzgxoYHXkn8Y66KdyvdXTRcG1RXkzwHy0oH2fuvXo3_tngPNb8UfHj06Zv20QYUB-j6rnjyCX_cXZTK1TuTUTkvKjyoqwrfSMcY7H5msYtM8S0UjmgX2I3-tJjeRJPu3fgmJasBCmQeUVYR63szoVgoqNA0F6DaWgqwwAaGRSpMkjjTEEmlpIWa3JnKlQUAkSU6B8pM-xA1irKAI4TzyPDYaEY1F0xqkCGLoA25kSZWkcqOUdOdfvZS98iYrQ9-8sf-JdpOx4P-rN8bPpyiHXcbvnKLnaFG9bqEc-vqK33hL_gDDwWlhg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+30th+Symposium+on+Computer+Arithmetic+%28ARITH%29&rft.atitle=Enhanced+Floating-Point+Multiply-Add+with+Full+Denormal+Support&rft.au=Sohn%2C+Jongwook&rft.au=Dean%2C+David+K.&rft.au=Quintana%2C+Eric&rft.au=Wong%2C+Wing+Shek&rft.date=2023-09-04&rft.pub=IEEE&rft.eissn=2576-2265&rft.spage=143&rft.epage=150&rft_id=info:doi/10.1109%2FARITH58626.2023.00015&rft.externalDocID=10461485