Enhanced Floating-Point Multiply-Add with Full Denormal Support

This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double p...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 30th Symposium on Computer Arithmetic (ARITH) pp. 143 - 150
Main Authors Sohn, Jongwook, Dean, David K., Quintana, Eric, Wong, Wing Shek
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs.
AbstractList This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many applications. The proposed FMA is executed in 4 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FMA fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 4-cycle FMA with full denormal support, several optimization techniques are applied: one-way alignment, radix-16 Booth encoding for the multiplier, merged J-bit correction and aligned significand with the multiply array, modified leading zero anticipation (LZA) for masking the underflow, parallel sticky and all-ones detection with the normalization, and merged two's complement with the rounding logic. As a result, the proposed FMA achieved not only full denormal support but also about 10 - 30% reduced area and about 10 - 20% reduced latency compared to the traditional FMAs.
Author Sohn, Jongwook
Quintana, Eric
Dean, David K.
Wong, Wing Shek
Author_xml – sequence: 1
  givenname: Jongwook
  surname: Sohn
  fullname: Sohn, Jongwook
  email: jongwook.sohn@intel.com
  organization: Intel Corporation,Austin,TX,USA
– sequence: 2
  givenname: David K.
  surname: Dean
  fullname: Dean, David K.
  email: david.k.dean@intel.com
  organization: Intel Corporation,Austin,TX,USA
– sequence: 3
  givenname: Eric
  surname: Quintana
  fullname: Quintana, Eric
  email: eric.quintana@intel.com
  organization: Intel Corporation,Austin,TX,USA
– sequence: 4
  givenname: Wing Shek
  surname: Wong
  fullname: Wong, Wing Shek
  email: wing.shek.wong@intel.com
  organization: Intel Corporation,Austin,TX,USA
BookMark eNotzM1Kw0AUQOFRFKw1b6CQF0i8c-d_JaE2tlBRtK7LJJnYgekkJCmlby-iq7P5OLfkKnbREfJAIacUzGPxsd6uhJYocwRkOQBQcUESo4xmAhg1iPySzFAomSFKcUOScfQVcOTSKKln5GkZ9zbWrknL0NnJx-_svfNxSl-PYfJ9OGdF06QnP-3T8hhC-uxiNxxsSD-Pfd8N0x25bm0YXfLfOfkql9vFKtu8vawXxSbzCHzKpKS1spwqUBW0yllWGeUYStdwtFRrWVcOjbWGCtMCl1RYQVHVumod8IbNyf3f1zvndv3gD3Y47-iv5FqwHzSjTD0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ARITH58626.2023.00015
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350319224
EISSN 2576-2265
EndPage 150
ExternalDocumentID 10461485
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i204t-661c7a41707b0f7ea3b97e326ed42a1886cbe29aa9159f04615a5127c8bfe04d3
IEDL.DBID RIE
IngestDate Wed Jun 26 19:43:02 EDT 2024
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-661c7a41707b0f7ea3b97e326ed42a1886cbe29aa9159f04615a5127c8bfe04d3
PageCount 8
ParticipantIDs ieee_primary_10461485
PublicationCentury 2000
PublicationDate 2023-Sept.-4
PublicationDateYYYYMMDD 2023-09-04
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-4
  day: 04
PublicationDecade 2020
PublicationTitle 2023 IEEE 30th Symposium on Computer Arithmetic (ARITH)
PublicationTitleAbbrev ARITH
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib042469768
ssib054984879
Score 2.279909
Snippet This paper presents an enhanced floating-point multiply-add (FMA) design for the Intel E-Core processor. FMA is one of the most widely used operation in many...
SourceID ieee
SourceType Publisher
StartPage 143
SubjectTerms Delays
Digital arithmetic
Encoding
floating-point arithmetic
floating-point denormal numbers
Floating-point multiply-add
high-speed computer arithmetic
Logic arrays
Next generation networking
Optimization
Title Enhanced Floating-Point Multiply-Add with Full Denormal Support
URI https://ieeexplore.ieee.org/document/10461485
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA3akycVK36Tg9dss2my2ZykaEsVWoq00FvJxywWy67I9qC_3iTb-gWCt5DThATem8x7MwhdS6EAjDMks2lBuDCamNAIUlLn-T14yIzV89E4G874w1zMN2b16IUBgCg-gyQsYy3fVXYdvso6oR7p6bvYRbtSqcastX08nPlE7xt39nlP7sm42rh2Uqo6vcf76VAECp-EoeFJ9BH_mKoSQWWwj8bbcBotyXOyrk1i3391avx3vAeo_eXfw5NPZDpEO1AeoZt--RTL_XiwqnRQO5NJtSxrPGo0hW-k5xwO_7I45KX4DsrAZ1c4DP70JL2NZoP-9HZINuMTyJJRXhOPvFZqnkoqDS0k6K5REjxdA8eZTvM8swaY0lp5SlOEUIX28C9tbgqg3HWPUausSjhBuGBOZM5waoTkyoBKOYMuFE65TDNtT1E7nH7x0nTIWGwPfvbH_jnaCzcQtVr8ArXq1zVcenCvzVW81A9gUKI7
link.rule.ids 310,311,783,787,792,793,799,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1NSwMxEIaD1oOeVKz4bQ5es81uk83mJEUtW21LkRZ6K8lmFotlV2R70F9vkm39AsFbyGlCAvNMZt4ZhK4ElwDaaBJnYU4Y14po1whSUGP5HqzL9NnzwTBOJ-x-yqcrsbrXwgCALz6DwC19Lt-U2dJ9lbVcPtLiO99EWxask7iWa62fD4tsqPeNnm3kk1gclyvdTkhlq_PYG6fcQXzgxoYHXkn8Y66KdyvdXTRcG1RXkzwHy0oH2fuvXo3_tngPNb8UfHj06Zv20QYUB-j6rnjyCX_cXZTK1TuTUTkvKjyoqwrfSMcY7H5msYtM8S0UjmgX2I3-tJjeRJPu3fgmJasBCmQeUVYR63szoVgoqNA0F6DaWgqwwAaGRSpMkjjTEEmlpIWa3JnKlQUAkSU6B8pM-xA1irKAI4TzyPDYaEY1F0xqkCGLoA25kSZWkcqOUdOdfvZS98iYrQ9-8sf-JdpOx4P-rN8bPpyiHXcbvnKLnaFG9bqEc-vqK33hL_gDDwWlhg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+30th+Symposium+on+Computer+Arithmetic+%28ARITH%29&rft.atitle=Enhanced+Floating-Point+Multiply-Add+with+Full+Denormal+Support&rft.au=Sohn%2C+Jongwook&rft.au=Dean%2C+David+K.&rft.au=Quintana%2C+Eric&rft.au=Wong%2C+Wing+Shek&rft.date=2023-09-04&rft.pub=IEEE&rft.eissn=2576-2265&rft.spage=143&rft.epage=150&rft_id=info:doi/10.1109%2FARITH58626.2023.00015&rft.externalDocID=10461485