Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy
Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon a...
Saved in:
Published in | IEEE transactions on very large scale integration (VLSI) systems Vol. 25; no. 10; pp. 2803 - 2816 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.10.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the "inherent core redundancy" of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods. |
---|---|
AbstractList | Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the "inherent core redundancy" of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods. |
Author | Ching-Yao Chou Huai-Ting Li Yuan-Ting Hsieh An-Yeu Wu Wei-Ching Chu |
Author_xml | – sequence: 1 givenname: Huai-Ting orcidid: 0000-0003-4731-8633 surname: Li fullname: Li, Huai-Ting – sequence: 2 givenname: Ching-Yao surname: Chou fullname: Chou, Ching-Yao – sequence: 3 givenname: Yuan-Ting surname: Hsieh fullname: Hsieh, Yuan-Ting – sequence: 4 givenname: Wei-Ching surname: Chu fullname: Chu, Wei-Ching – sequence: 5 givenname: An-Yeu surname: Wu fullname: Wu, An-Yeu |
BookMark | eNo9kMtOwzAQRS0EEm3hB2CTH3AZOw_by6oUqFSE1JayjJx4XIxSp3KCIH9P-hCzuaPRnLs4Q3Lpa4-E3DEYMwbqYb1ZrOZjDkyMuWCphPiCDFiaCqr6uex3yGIqOYNrMmyaLwCWJAoG5GOjg9Otqz2d_OiA0RIrp4sKo1ftOzqt-9Oqa1rcRY_YuK2Pii6a_e6r2rXOb6O5_8SAvo2On0s0395oX3Y35MrqqsHbc47I-9NsPX2hi7fn-XSyoCXPREutsSUXkJSpiRMJEiAtBZdoRaZLXYAqjEgKY60yAjJlMzSJEtrajCspCxWPCD_1lqFumoA23we306HLGeQHNflRTX5Qk5_V9ND9CXKI-A8IlSqQSfwHbrVjNg |
CODEN | IEVSE9 |
CitedBy_id | crossref_primary_10_1145_3663672 crossref_primary_10_1007_s11227_023_05159_6 crossref_primary_10_1109_TCAD_2021_3092683 crossref_primary_10_1109_TCAD_2021_3102893 |
Cites_doi | 10.1147/rd.62.0200 10.1109/PRDC.2008.40 10.1109/DSN.2002.1028924 10.1109/SBAC-PAD.2010.37 10.1109/TC.2010.168 10.1109/ETS.2006.42 10.1109/VLSI-DAT.2016.7482558 10.1109/JPROC.2008.917729 10.1109/TPDS.2013.14 10.1145/313817.313834 10.1109/DATE.2012.6176659 10.1109/MICRO.2003.1253179 10.7873/DATE.2013.023 10.1109/DATE.2007.364539 10.1109/MC.2005.70 10.1109/TVLSI.2006.874359 10.1109/TCAD.2011.2179038 10.1109/T-C.1975.224263 10.1109/ICCD.2013.6657025 10.1109/TSM.2007.913186 10.1109/IOLTS.2011.5993811 10.1109/SOCC.2006.283890 10.1109/HPCC.2012.233 10.1109/DSN.2007.100 10.1109/TNS.2012.2219070 10.1109/MICRO.2003.1253181 10.1109/L-CA.2008.12 10.1109/ALLERTON.2014.7028472 10.1109/HOTCHIPS.2012.7476487 10.1007/s11265-014-0958-0 10.1145/1669112.1669172 10.1109/DATE.2010.5457242 10.1109/SiPS.2015.7345008 10.1109/TC.2010.253 10.1109/IOLTS.2005.15 10.1109/TVLSI.2004.826201 10.1145/1250662.1250726 10.1109/MCSoC.2014.33 10.1109/HPCC.and.EUC.2013.299 10.1109/JSSC.2010.2080550 10.1007/3-540-45591-4_46 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TVLSI.2017.2715803 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1557-9999 |
EndPage | 2816 |
ExternalDocumentID | 10_1109_TVLSI_2017_2715803 7959084 |
Genre | orig-research |
GrantInformation_xml | – fundername: Ministry of Science and Technology of Taiwan grantid: MOST 104-2220-E-002-003; MOST 105-2218-E-002-024 funderid: 10.13039/501100004663 |
GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAYOK ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AI. AIBXA AKJIK ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIC RIE RIG RNS TN5 VH1 XFK AAYXX CITATION |
ID | FETCH-LOGICAL-c267t-fdfc2704c5d34808005c728ef76acab09bd74bdff9d7069f6ed497aff62988b93 |
IEDL.DBID | RIE |
ISSN | 1063-8210 |
IngestDate | Fri Aug 23 01:11:14 EDT 2024 Wed Jun 26 19:18:26 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 10 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c267t-fdfc2704c5d34808005c728ef76acab09bd74bdff9d7069f6ed497aff62988b93 |
ORCID | 0000-0003-4731-8633 |
PageCount | 14 |
ParticipantIDs | ieee_primary_7959084 crossref_primary_10_1109_TVLSI_2017_2715803 |
PublicationCentury | 2000 |
PublicationDate | 2017-Oct. 2017-10-00 |
PublicationDateYYYYMMDD | 2017-10-01 |
PublicationDate_xml | – month: 10 year: 2017 text: 2017-Oct. |
PublicationDecade | 2010 |
PublicationTitle | IEEE transactions on very large scale integration (VLSI) systems |
PublicationTitleAbbrev | TVLSI |
PublicationYear | 2017 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref35 ref13 ref34 ref12 ref37 ref15 ref36 ref14 ref31 das (ref30) 2014 ref33 ref11 ref32 ref10 ref2 ref1 ref39 ref17 ref38 ref16 ref19 ref18 ding (ref6) 2005 ref24 ref23 ref26 ref25 ref20 ref42 ref41 ref22 ref44 ref21 ref43 han (ref40) 2015; 28 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref5 |
References_xml | – ident: ref7 doi: 10.1147/rd.62.0200 – ident: ref27 doi: 10.1109/PRDC.2008.40 – start-page: 1117 year: 2005 ident: ref6 article-title: Impact of process variation on soft error vulnerability for nanometer VLSI circuits publication-title: Proc ASICON contributor: fullname: ding – ident: ref42 doi: 10.1109/DSN.2002.1028924 – ident: ref23 doi: 10.1109/SBAC-PAD.2010.37 – ident: ref26 doi: 10.1109/TC.2010.168 – ident: ref34 doi: 10.1109/ETS.2006.42 – ident: ref17 doi: 10.1109/VLSI-DAT.2016.7482558 – ident: ref2 doi: 10.1109/JPROC.2008.917729 – ident: ref11 doi: 10.1109/TPDS.2013.14 – ident: ref8 doi: 10.1145/313817.313834 – ident: ref37 doi: 10.1109/DATE.2012.6176659 – ident: ref9 doi: 10.1109/MICRO.2003.1253179 – ident: ref39 doi: 10.7873/DATE.2013.023 – ident: ref5 doi: 10.1109/DATE.2007.364539 – ident: ref33 doi: 10.1109/MC.2005.70 – ident: ref3 doi: 10.1109/TVLSI.2006.874359 – ident: ref10 doi: 10.1109/TCAD.2011.2179038 – ident: ref22 doi: 10.1109/T-C.1975.224263 – ident: ref38 doi: 10.1109/ICCD.2013.6657025 – ident: ref19 doi: 10.1109/TSM.2007.913186 – ident: ref29 doi: 10.1109/IOLTS.2011.5993811 – ident: ref18 doi: 10.1109/SOCC.2006.283890 – ident: ref14 doi: 10.1109/HPCC.2012.233 – ident: ref12 doi: 10.1109/DSN.2007.100 – ident: ref4 doi: 10.1109/TNS.2012.2219070 – ident: ref36 doi: 10.1109/MICRO.2003.1253181 – ident: ref20 doi: 10.1109/L-CA.2008.12 – ident: ref32 doi: 10.1109/ALLERTON.2014.7028472 – ident: ref1 doi: 10.1109/HOTCHIPS.2012.7476487 – ident: ref31 doi: 10.1007/s11265-014-0958-0 – ident: ref44 doi: 10.1145/1669112.1669172 – start-page: 1 year: 2014 ident: ref30 article-title: Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs publication-title: Proc Design Autom Test Eur Conf Exhibition (DATE) contributor: fullname: das – ident: ref21 doi: 10.1109/DATE.2010.5457242 – ident: ref16 doi: 10.1109/SiPS.2015.7345008 – ident: ref25 doi: 10.1109/TC.2010.253 – ident: ref43 doi: 10.1109/IOLTS.2005.15 – ident: ref24 doi: 10.1109/TVLSI.2004.826201 – ident: ref35 doi: 10.1145/1250662.1250726 – volume: 28 start-page: 1135 year: 2015 ident: ref40 article-title: Learning both weights and connections for efficient neural networks publication-title: Proc Adv Neural Inf Process Syst contributor: fullname: han – ident: ref15 doi: 10.1109/MCSoC.2014.33 – ident: ref13 doi: 10.1109/HPCC.and.EUC.2013.299 – ident: ref28 doi: 10.1109/JSSC.2010.2080550 – ident: ref41 doi: 10.1007/3-540-45591-4_46 |
SSID | ssj0014490 |
Score | 2.2896843 |
Snippet | Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust... |
SourceID | crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 2803 |
SubjectTerms | Fault tolerance Hardware Integrated circuit reliability many-core systems Multicore processing Nuclear magnetic resonance Redundancy Robustness software–hardware codesign task mapping variations |
Title | Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy |
URI | https://ieeexplore.ieee.org/document/7959084 |
Volume | 25 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA7rnvTgW1xf5OBNU7tpmsdx8YErrgfdVW-lSSYoQleWLqK_3iZtFxEP3kpISJgJyUzzfd8gdGylTCylOUlypwmz0hAtVJ9YndpEgEhYYMiN7vj1hN08p88ddLrgwgBAAJ9B5D_DW76dmrn_VXbm62LHki2hJRnTmqu1eDFgTNXKAzwhsspjWoJMrM7Gj7cPQ4_iEhEV_VS2BbKaS-hHVZVwqVytoVG7nBpL8hbNSx2Zr19Kjf9d7zpabaJLPKi3wwbqQLGJVn5oDm6hp8cqOw7uIIOPfAbYg5I9fwqPqnOBnFdz4FrGHF8EdAfWnzgg9V49QhoPixfPECxx6HkPnoXmT-htNLm6HJ9fk6a6AjGUi5I46wwVMTOVU5hXl4xTI6gEJ3huch0rbQXT1jllRcyV42CZErlznCoptUp2ULeYFrCLMDcuZUBt36ZQBQigABImhI-FIJGG99BJa-7svRbRyELyEassOCfzzska5_TQljflomdjxb2_m_fRsh9c4-sOULeczeGwihNKfRQ2yDcNl7sV |
link.rule.ids | 315,786,790,802,27955,27956,55107 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADO2LHB27gkiaOlyMqVC20HKAstyi2xwIhtQilQvD1xE5aIcSBWxRZjjVjeWbi994AHFspExvHOU1ypymz0lAtVJNandpEoEhYYMj1b3jnnl09pU8zcDrlwiBiAJ9hwz-Gu3w7MmP_q-zM98WOJJuF-TLOR6Jia03vDBhTlfYAT6gsK5kJRSZSZ4OH3l3X47hEIxbNVE5aZNVh6EdflRBW2ivQnyyoQpO8NsaFbpivX1qN_13xKizX-SU5rzbEGszgcB2WfqgObsDjQ1kfB4fQ84_8HYmHJXsGFemXJwNtld8glZA5uQj4DqI_ScDqvXiMNOkOnz1HsCBh5C16Hpo_ozfhvn05aHVo3V-BmpiLgjrrTCwiZkq3MK8vGaVGxBKd4LnJdaS0FUxb55QVEVeOo2VK5M7xWEmpVbIFc8PRELeBcONShrFt2hTLFAEVYsKE8NkQJtLwHTiZmDt7q2Q0slB-RCoLzsm8c7LaOTuw4U05HVlbcffv10ew0Bn0e1mve3O9B4t-ogpttw9zxfsYD8qsodCHYbN8Axekvmk |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Variation-Aware+Reliable+Many-Core+System+Design+by+Exploiting+Inherent+Core+Redundancy&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Li%2C+Huai-Ting&rft.au=Chou%2C+Ching-Yao&rft.au=Hsieh%2C+Yuan-Ting&rft.au=Chu%2C+Wei-Ching&rft.date=2017-10-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=25&rft.issue=10&rft.spage=2803&rft.epage=2816&rft_id=info:doi/10.1109%2FTVLSI.2017.2715803&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2017_2715803 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon |