Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy

Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon a...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on very large scale integration (VLSI) systems Vol. 25; no. 10; pp. 2803 - 2816
Main Authors Li, Huai-Ting, Chou, Ching-Yao, Hsieh, Yuan-Ting, Chu, Wei-Ching, Wu, An-Yeu
Format Journal Article
LanguageEnglish
Published IEEE 01.10.2017
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the "inherent core redundancy" of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods.
AbstractList Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust computing in nanoscale designs, many circuit-level and architecture-level redundancy techniques had been proposed, which pose large fixed silicon area overhead and a lack of flexibility. In recent years, some methods have exploited the "inherent core redundancy" of many-core systems to implicitly implement N-modular redundant (NMR) subsystems to achieve area-efficient fault-tolerant computing. However, while facing the different levels of soft error rate, task vulnerability, and task significance in the many-core system, existing core-level redundancy methods become ineffective. To achieve robust computation in many-core systems with intercore variations and mixed workloads, we propose a variation-aware core-level redundancy scheme. Two novel approaches are presented in this scheme: 1) we construct NMR tables that store the degree of redundancy using mathematical models for systems affected by these variations and 2) we dynamically allocate each replicated task to a proper core with variation-aware mapping algorithms to achieve high reliability. Based on a modified multicore simulator, Sniper-Transient Error Process Variation (TEVR), the experimental results show that the proposed scheme can increase the reliability by 47.92% and achieve the energy saving of 39% compared with conventional core-level redundancy methods.
Author Ching-Yao Chou
Huai-Ting Li
Yuan-Ting Hsieh
An-Yeu Wu
Wei-Ching Chu
Author_xml – sequence: 1
  givenname: Huai-Ting
  orcidid: 0000-0003-4731-8633
  surname: Li
  fullname: Li, Huai-Ting
– sequence: 2
  givenname: Ching-Yao
  surname: Chou
  fullname: Chou, Ching-Yao
– sequence: 3
  givenname: Yuan-Ting
  surname: Hsieh
  fullname: Hsieh, Yuan-Ting
– sequence: 4
  givenname: Wei-Ching
  surname: Chu
  fullname: Chu, Wei-Ching
– sequence: 5
  givenname: An-Yeu
  surname: Wu
  fullname: Wu, An-Yeu
BookMark eNo9kMtOwzAQRS0EEm3hB2CTH3AZOw_by6oUqFSE1JayjJx4XIxSp3KCIH9P-hCzuaPRnLs4Q3Lpa4-E3DEYMwbqYb1ZrOZjDkyMuWCphPiCDFiaCqr6uex3yGIqOYNrMmyaLwCWJAoG5GOjg9Otqz2d_OiA0RIrp4sKo1ftOzqt-9Oqa1rcRY_YuK2Pii6a_e6r2rXOb6O5_8SAvo2On0s0395oX3Y35MrqqsHbc47I-9NsPX2hi7fn-XSyoCXPREutsSUXkJSpiRMJEiAtBZdoRaZLXYAqjEgKY60yAjJlMzSJEtrajCspCxWPCD_1lqFumoA23we306HLGeQHNflRTX5Qk5_V9ND9CXKI-A8IlSqQSfwHbrVjNg
CODEN IEVSE9
CitedBy_id crossref_primary_10_1145_3663672
crossref_primary_10_1007_s11227_023_05159_6
crossref_primary_10_1109_TCAD_2021_3092683
crossref_primary_10_1109_TCAD_2021_3102893
Cites_doi 10.1147/rd.62.0200
10.1109/PRDC.2008.40
10.1109/DSN.2002.1028924
10.1109/SBAC-PAD.2010.37
10.1109/TC.2010.168
10.1109/ETS.2006.42
10.1109/VLSI-DAT.2016.7482558
10.1109/JPROC.2008.917729
10.1109/TPDS.2013.14
10.1145/313817.313834
10.1109/DATE.2012.6176659
10.1109/MICRO.2003.1253179
10.7873/DATE.2013.023
10.1109/DATE.2007.364539
10.1109/MC.2005.70
10.1109/TVLSI.2006.874359
10.1109/TCAD.2011.2179038
10.1109/T-C.1975.224263
10.1109/ICCD.2013.6657025
10.1109/TSM.2007.913186
10.1109/IOLTS.2011.5993811
10.1109/SOCC.2006.283890
10.1109/HPCC.2012.233
10.1109/DSN.2007.100
10.1109/TNS.2012.2219070
10.1109/MICRO.2003.1253181
10.1109/L-CA.2008.12
10.1109/ALLERTON.2014.7028472
10.1109/HOTCHIPS.2012.7476487
10.1007/s11265-014-0958-0
10.1145/1669112.1669172
10.1109/DATE.2010.5457242
10.1109/SiPS.2015.7345008
10.1109/TC.2010.253
10.1109/IOLTS.2005.15
10.1109/TVLSI.2004.826201
10.1145/1250662.1250726
10.1109/MCSoC.2014.33
10.1109/HPCC.and.EUC.2013.299
10.1109/JSSC.2010.2080550
10.1007/3-540-45591-4_46
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TVLSI.2017.2715803
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1557-9999
EndPage 2816
ExternalDocumentID 10_1109_TVLSI_2017_2715803
7959084
Genre orig-research
GrantInformation_xml – fundername: Ministry of Science and Technology of Taiwan
  grantid: MOST 104-2220-E-002-003; MOST 105-2218-E-002-024
  funderid: 10.13039/501100004663
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAYOK
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AI.
AIBXA
AKJIK
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIC
RIE
RIG
RNS
TN5
VH1
XFK
AAYXX
CITATION
ID FETCH-LOGICAL-c267t-fdfc2704c5d34808005c728ef76acab09bd74bdff9d7069f6ed497aff62988b93
IEDL.DBID RIE
ISSN 1063-8210
IngestDate Fri Aug 23 01:11:14 EDT 2024
Wed Jun 26 19:18:26 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c267t-fdfc2704c5d34808005c728ef76acab09bd74bdff9d7069f6ed497aff62988b93
ORCID 0000-0003-4731-8633
PageCount 14
ParticipantIDs ieee_primary_7959084
crossref_primary_10_1109_TVLSI_2017_2715803
PublicationCentury 2000
PublicationDate 2017-Oct.
2017-10-00
PublicationDateYYYYMMDD 2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 2017-Oct.
PublicationDecade 2010
PublicationTitle IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev TVLSI
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
References ref35
ref13
ref34
ref12
ref37
ref15
ref36
ref14
ref31
das (ref30) 2014
ref33
ref11
ref32
ref10
ref2
ref1
ref39
ref17
ref38
ref16
ref19
ref18
ding (ref6) 2005
ref24
ref23
ref26
ref25
ref20
ref42
ref41
ref22
ref44
ref21
ref43
han (ref40) 2015; 28
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ref3
ref5
References_xml – ident: ref7
  doi: 10.1147/rd.62.0200
– ident: ref27
  doi: 10.1109/PRDC.2008.40
– start-page: 1117
  year: 2005
  ident: ref6
  article-title: Impact of process variation on soft error vulnerability for nanometer VLSI circuits
  publication-title: Proc ASICON
  contributor:
    fullname: ding
– ident: ref42
  doi: 10.1109/DSN.2002.1028924
– ident: ref23
  doi: 10.1109/SBAC-PAD.2010.37
– ident: ref26
  doi: 10.1109/TC.2010.168
– ident: ref34
  doi: 10.1109/ETS.2006.42
– ident: ref17
  doi: 10.1109/VLSI-DAT.2016.7482558
– ident: ref2
  doi: 10.1109/JPROC.2008.917729
– ident: ref11
  doi: 10.1109/TPDS.2013.14
– ident: ref8
  doi: 10.1145/313817.313834
– ident: ref37
  doi: 10.1109/DATE.2012.6176659
– ident: ref9
  doi: 10.1109/MICRO.2003.1253179
– ident: ref39
  doi: 10.7873/DATE.2013.023
– ident: ref5
  doi: 10.1109/DATE.2007.364539
– ident: ref33
  doi: 10.1109/MC.2005.70
– ident: ref3
  doi: 10.1109/TVLSI.2006.874359
– ident: ref10
  doi: 10.1109/TCAD.2011.2179038
– ident: ref22
  doi: 10.1109/T-C.1975.224263
– ident: ref38
  doi: 10.1109/ICCD.2013.6657025
– ident: ref19
  doi: 10.1109/TSM.2007.913186
– ident: ref29
  doi: 10.1109/IOLTS.2011.5993811
– ident: ref18
  doi: 10.1109/SOCC.2006.283890
– ident: ref14
  doi: 10.1109/HPCC.2012.233
– ident: ref12
  doi: 10.1109/DSN.2007.100
– ident: ref4
  doi: 10.1109/TNS.2012.2219070
– ident: ref36
  doi: 10.1109/MICRO.2003.1253181
– ident: ref20
  doi: 10.1109/L-CA.2008.12
– ident: ref32
  doi: 10.1109/ALLERTON.2014.7028472
– ident: ref1
  doi: 10.1109/HOTCHIPS.2012.7476487
– ident: ref31
  doi: 10.1007/s11265-014-0958-0
– ident: ref44
  doi: 10.1145/1669112.1669172
– start-page: 1
  year: 2014
  ident: ref30
  article-title: Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs
  publication-title: Proc Design Autom Test Eur Conf Exhibition (DATE)
  contributor:
    fullname: das
– ident: ref21
  doi: 10.1109/DATE.2010.5457242
– ident: ref16
  doi: 10.1109/SiPS.2015.7345008
– ident: ref25
  doi: 10.1109/TC.2010.253
– ident: ref43
  doi: 10.1109/IOLTS.2005.15
– ident: ref24
  doi: 10.1109/TVLSI.2004.826201
– ident: ref35
  doi: 10.1145/1250662.1250726
– volume: 28
  start-page: 1135
  year: 2015
  ident: ref40
  article-title: Learning both weights and connections for efficient neural networks
  publication-title: Proc Adv Neural Inf Process Syst
  contributor:
    fullname: han
– ident: ref15
  doi: 10.1109/MCSoC.2014.33
– ident: ref13
  doi: 10.1109/HPCC.and.EUC.2013.299
– ident: ref28
  doi: 10.1109/JSSC.2010.2080550
– ident: ref41
  doi: 10.1007/3-540-45591-4_46
SSID ssj0014490
Score 2.2896843
Snippet Reliability issues are more severe in multi/many-core systems because of the integration of more devices in advanced technology nodes. To achieve robust...
SourceID crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 2803
SubjectTerms Fault tolerance
Hardware
Integrated circuit reliability
many-core systems
Multicore processing
Nuclear magnetic resonance
Redundancy
Robustness
software–hardware codesign
task mapping
variations
Title Variation-Aware Reliable Many-Core System Design by Exploiting Inherent Core Redundancy
URI https://ieeexplore.ieee.org/document/7959084
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA7rnvTgW1xf5OBNU7tpmsdx8YErrgfdVW-lSSYoQleWLqK_3iZtFxEP3kpISJgJyUzzfd8gdGylTCylOUlypwmz0hAtVJ9YndpEgEhYYMiN7vj1hN08p88ddLrgwgBAAJ9B5D_DW76dmrn_VXbm62LHki2hJRnTmqu1eDFgTNXKAzwhsspjWoJMrM7Gj7cPQ4_iEhEV_VS2BbKaS-hHVZVwqVytoVG7nBpL8hbNSx2Zr19Kjf9d7zpabaJLPKi3wwbqQLGJVn5oDm6hp8cqOw7uIIOPfAbYg5I9fwqPqnOBnFdz4FrGHF8EdAfWnzgg9V49QhoPixfPECxx6HkPnoXmT-htNLm6HJ9fk6a6AjGUi5I46wwVMTOVU5hXl4xTI6gEJ3huch0rbQXT1jllRcyV42CZErlznCoptUp2ULeYFrCLMDcuZUBt36ZQBQigABImhI-FIJGG99BJa-7svRbRyELyEassOCfzzska5_TQljflomdjxb2_m_fRsh9c4-sOULeczeGwihNKfRQ2yDcNl7sV
link.rule.ids 315,786,790,802,27955,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADO2LHB27gkiaOlyMqVC20HKAstyi2xwIhtQilQvD1xE5aIcSBWxRZjjVjeWbi994AHFspExvHOU1ypymz0lAtVJNandpEoEhYYMj1b3jnnl09pU8zcDrlwiBiAJ9hwz-Gu3w7MmP_q-zM98WOJJuF-TLOR6Jia03vDBhTlfYAT6gsK5kJRSZSZ4OH3l3X47hEIxbNVE5aZNVh6EdflRBW2ivQnyyoQpO8NsaFbpivX1qN_13xKizX-SU5rzbEGszgcB2WfqgObsDjQ1kfB4fQ84_8HYmHJXsGFemXJwNtld8glZA5uQj4DqI_ScDqvXiMNOkOnz1HsCBh5C16Hpo_ozfhvn05aHVo3V-BmpiLgjrrTCwiZkq3MK8vGaVGxBKd4LnJdaS0FUxb55QVEVeOo2VK5M7xWEmpVbIFc8PRELeBcONShrFt2hTLFAEVYsKE8NkQJtLwHTiZmDt7q2Q0slB-RCoLzsm8c7LaOTuw4U05HVlbcffv10ew0Bn0e1mve3O9B4t-ogpttw9zxfsYD8qsodCHYbN8Axekvmk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Variation-Aware+Reliable+Many-Core+System+Design+by+Exploiting+Inherent+Core+Redundancy&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Li%2C+Huai-Ting&rft.au=Chou%2C+Ching-Yao&rft.au=Hsieh%2C+Yuan-Ting&rft.au=Chu%2C+Wei-Ching&rft.date=2017-10-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=25&rft.issue=10&rft.spage=2803&rft.epage=2816&rft_id=info:doi/10.1109%2FTVLSI.2017.2715803&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2017_2715803
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon