Enhanced Floating-Point Multiply-Add with Full Denormal Support
This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double p...
Saved in:
Published in | 2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 143 - 150 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
04.09.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs. |
---|---|
AbstractList | This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs. |
Author | Sohn, Jongwook Quintana, Eric Dean, David K. Wong, Wing Shek |
Author_xml | – sequence: 1 givenname: Jongwook surname: Sohn fullname: Sohn, Jongwook email: jongwook.sohn@intel.com organization: Intel Corporation,Austin,TX,USA – sequence: 2 givenname: David K. surname: Dean fullname: Dean, David K. email: david.k.dean@intel.com organization: Intel Corporation,Austin,TX,USA – sequence: 3 givenname: Eric surname: Quintana fullname: Quintana, Eric email: eric.quintana@intel.com organization: Intel Corporation,Austin,TX,USA – sequence: 4 givenname: Wing Shek surname: Wong fullname: Wong, Wing Shek email: wing.shek.wong@intel.com organization: Intel Corporation,Austin,TX,USA |
BookMark | eNotzM1Kw0AUQOFRFKw1b6CQF0i8c-d_JaE2tlBRtK7LJJnYgekkJCmlby-iq7P5OLfkKnbREfJAIacUzGPxsd6uhJYocwRkOQBQcUESo4xmAhg1iPySzFAomSFKcUOScfQVcOTSKKln5GkZ9zbWrknL0NnJx-_svfNxSl-PYfJ9OGdF06QnP-3T8hhC-uxiNxxsSD-Pfd8N0x25bm0YXfLfOfkql9vFKtu8vawXxSbzCHzKpKS1spwqUBW0yllWGeUYStdwtFRrWVcOjbWGCtMCl1RYQVHVumod8IbNyf3f1zvndv3gD3Y47-iv5FqwHzSjTD0 |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ARITH58626.2023.00015 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350319224 |
EISSN | 2576-2265 |
EndPage | 150 |
ExternalDocumentID | 10461485 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i204t-661c7a41707b0f7ea3b97e326ed42a1886cbe29aa9159f04615a5127c8bfe04d3 |
IEDL.DBID | RIE |
IngestDate | Wed Jun 26 19:43:02 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i204t-661c7a41707b0f7ea3b97e326ed42a1886cbe29aa9159f04615a5127c8bfe04d3 |
PageCount | 8 |
ParticipantIDs | ieee_primary_10461485 |
PublicationCentury | 2000 |
PublicationDate | 2023-Sept.-4 |
PublicationDateYYYYMMDD | 2023-09-04 |
PublicationDate_xml | – month: 09 year: 2023 text: 2023-Sept.-4 day: 04 |
PublicationDecade | 2020 |
PublicationTitle | 2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) |
PublicationTitleAbbrev | ARITH |
PublicationYear | 2023 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssib042469768 ssib054984879 |
Score | 2.279909 |
Snippet | This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 143 |
SubjectTerms | Delays Digital arithmetic Encoding floating-point arithmetic floating-point denormal numbers Floating-point multiply-add high-speed computer arithmetic Logic arrays Next generation networking Optimization |
Title | Enhanced Floating-Point Multiply-Add with Full Denormal Support |
URI | https://ieeexplore.ieee.org/document/10461485 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA3akycVK36Tg9dss2my2ZykaEsVWoq00FvJxywWy67I9qC_3iTb-gWCt5DThATem8x7MwhdS6EAjDMks2lBuDCamNAIUlLn-T14yIzV89E4G874w1zMN2b16IUBgCg-gyQsYy3fVXYdvso6oR7p6bvYRbtSqcastX08nPlE7xt39nlP7sm42rh2Uqo6vcf76VAECp-EoeFJ9BH_mKoSQWWwj8bbcBotyXOyrk1i3391avx3vAeo_eXfw5NPZDpEO1AeoZt--RTL_XiwqnRQO5NJtSxrPGo0hW-k5xwO_7I45KX4DsrAZ1c4DP70JL2NZoP-9HZINuMTyJJRXhOPvFZqnkoqDS0k6K5REjxdA8eZTvM8swaY0lp5SlOEUIX28C9tbgqg3HWPUausSjhBuGBOZM5waoTkyoBKOYMuFE65TDNtT1E7nH7x0nTIWGwPfvbH_jnaCzcQtVr8ArXq1zVcenCvzVW81A9gUKI7 |
link.rule.ids | 310,311,783,787,792,793,799,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1NSwMxEIaD1oOeVKz4bQ5es81uk83mJEUtW21LkRZ6K8lmFotlV2R70F9vkm39AsFbyGlCAvNMZt4ZhK4ElwDaaBJnYU4Y14po1whSUGP5HqzL9NnzwTBOJ-x-yqcrsbrXwgCALz6DwC19Lt-U2dJ9lbVcPtLiO99EWxask7iWa62fD4tsqPeNnm3kk1gclyvdTkhlq_PYG6fcQXzgxoYHXkn8Y66KdyvdXTRcG1RXkzwHy0oH2fuvXo3_tngPNb8UfHj06Zv20QYUB-j6rnjyCX_cXZTK1TuTUTkvKjyoqwrfSMcY7H5msYtM8S0UjmgX2I3-tJjeRJPu3fgmJasBCmQeUVYR63szoVgoqNA0F6DaWgqwwAaGRSpMkjjTEEmlpIWa3JnKlQUAkSU6B8pM-xA1irKAI4TzyPDYaEY1F0xqkCGLoA25kSZWkcqOUdOdfvZS98iYrQ9-8sf-JdpOx4P-rN8bPpyiHXcbvnKLnaFG9bqEc-vqK33hL_gDDwWlhg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+30th+Symposium+on+Computer+Arithmetic+%28ARITH%29&rft.atitle=Enhanced+Floating-Point+Multiply-Add+with+Full+Denormal+Support&rft.au=Sohn%2C+Jongwook&rft.au=Dean%2C+David+K.&rft.au=Quintana%2C+Eric&rft.au=Wong%2C+Wing+Shek&rft.date=2023-09-04&rft.pub=IEEE&rft.eissn=2576-2265&rft.spage=143&rft.epage=150&rft_id=info:doi/10.1109%2FARITH58626.2023.00015&rft.externalDocID=10461485 |