Enhanced Phase-Driven Q -Learning-Based DRM for Multicore Processors

In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We e...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 38; no. 11; pp. 2022 - 2031
Main Authors Yang, Zhiyuan, Serafy, Caleb, Lu, Tiantao, Srivastava, Ankur
Format Journal Article
LanguageEnglish
Published IEEE 01.11.2019
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We employ ON/OFF switching actions and dynamic voltage and frequency scaling as control knobs (i.e., working modes) to tune the state of cores of the processor. In order to achieve this, our technique detects program phases and adaptively determines the optimal working modes for each phase using the Q-learning-based method. By integrating the phase detection into the Q-learning-based management, our technique can provide efficient management for the programs with highly diverse phases. We also propose three additional modules to improve the management efficiency of our technique. In order to evaluate our technique, we use it to manage a 3-D CPU with high-diver programs. Several failure mechanisms are considered in this case study. Our proposed technique is compared with two existing Q-learning-based techniques. The experimental results demonstrate that when the number of phases is smaller than the number of working modes, our technique can achieve more than 1.36× improvement in performance with 60% memory space savings.
AbstractList In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We employ ON/OFF switching actions and dynamic voltage and frequency scaling as control knobs (i.e., working modes) to tune the state of cores of the processor. In order to achieve this, our technique detects program phases and adaptively determines the optimal working modes for each phase using the Q-learning-based method. By integrating the phase detection into the Q-learning-based management, our technique can provide efficient management for the programs with highly diverse phases. We also propose three additional modules to improve the management efficiency of our technique. In order to evaluate our technique, we use it to manage a 3-D CPU with high-diver programs. Several failure mechanisms are considered in this case study. Our proposed technique is compared with two existing Q-learning-based techniques. The experimental results demonstrate that when the number of phases is smaller than the number of working modes, our technique can achieve more than 1.36× improvement in performance with 60% memory space savings.
Author Serafy, Caleb
Lu, Tiantao
Srivastava, Ankur
Yang, Zhiyuan
Author_xml – sequence: 1
  givenname: Zhiyuan
  orcidid: 0000-0002-2250-7959
  surname: Yang
  fullname: Yang, Zhiyuan
  email: zyyang@umd.edu
  organization: Electrical and Computer Engineering Department, University of Maryland at College Park, College Park, MD, USA
– sequence: 2
  givenname: Caleb
  surname: Serafy
  fullname: Serafy, Caleb
  organization: SOC Power Team, Apple, Cupertino, CA, USA
– sequence: 3
  givenname: Tiantao
  surname: Lu
  fullname: Lu, Tiantao
  organization: ICD Block Implementation Team, Cadence Design Systems Inc., San Jose, CA, USA
– sequence: 4
  givenname: Ankur
  surname: Srivastava
  fullname: Srivastava, Ankur
  organization: Electrical and Computer Engineering Department, University of Maryland at College Park, College Park, MD, USA
BookMark eNp9kM1OAjEUhRuDiYA-gHHTFyjedlraWSKDPwlENLielM6t1GDHtKOJby8E4sKFq5OcnO8svgHpxTYiIZccRpxDeb2aTqqRAG5GwmgNXJ6QPi8LzSRXvEf6ILRhABrOyCDnN9gtlCj7pJrFjY0OG7rc2IysSuELI32ibI42xRBf2c2ub2j1vKC-TXTxue2CaxPSZWod5tymfE5Ovd1mvDjmkLzczlbTezZ_vHuYTubMibHqmPMIiFxpa7XgAhpphTFrV3q59g0qrgyALBAdlxZLJ2RjwJZeaOeM86IYEn34danNOaGvXehsF9rYJRu2NYd6L6Pey6j3MuqjjB3J_5AfKbzb9P0vc3VgAiL-7o0CEGZc_ABS22ww
CODEN ITCSDI
CitedBy_id crossref_primary_10_1109_TCAD_2022_3158832
Cites_doi 10.1145/1735023.1735063
10.1109/LCA.2014.2340873
10.1109/TC.2011.47
10.1145/2593069.2593199
10.1109/ISPASS.2008.4510751
10.1109/MDT.2005.134
10.1109/ITHERM.2014.6892267
10.1109/IEDM.2015.7409647
10.1145/1837274.1837292
10.1109/ISCA.1995.524546
10.1109/TCAD.2015.2504875
10.1007/978-3-642-11515-8_15
10.1109/TCPMT.2010.2101771
10.1109/ISVLSI.2012.29
10.1109/TVLSI.2006.876103
10.1109/ASPDAC.2012.6165027
10.1109/JSSC.2010.2040125
10.1145/1146909.1147160
10.1145/2370816.2370865
10.1145/3061639.3062301
10.3850/9783981537079_0441
10.1016/j.enbuild.2011.02.007
10.1145/1669112.1669172
10.1109/TCSI.2011.2163894
10.1109/MM.2003.1261391
10.1145/2024724.2024746
10.1109/ISCA.2008.15
10.1109/TCAD.2014.2360456
10.1145/1454115.1454128
10.1109/MM.2005.54
10.1145/2540708.2540746
10.1109/ISCA.2008.40
10.1109/TCAD.2017.2772822
10.1109/MICRO.2006.30
10.7873/DATE.2015.0992
10.1109/TCAD.2017.2666604
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TCAD.2018.2877014
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1937-4151
EndPage 2031
ExternalDocumentID 10_1109_TCAD_2018_2877014
8500286
Genre orig-research
GroupedDBID --Z
-~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IBMZZ
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PZZ
RIA
RIE
RNS
TN5
VH1
VJK
AAYXX
CITATION
RIG
ID FETCH-LOGICAL-c265t-cfe0ee157aa72120d4a288bc9f4bfde51580043eec14ae9c24d80a9f27cc8cf23
IEDL.DBID RIE
ISSN 0278-0070
IngestDate Tue Jul 01 00:30:50 EDT 2025
Thu Apr 24 22:55:42 EDT 2025
Wed Aug 27 02:43:04 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c265t-cfe0ee157aa72120d4a288bc9f4bfde51580043eec14ae9c24d80a9f27cc8cf23
ORCID 0000-0002-2250-7959
PageCount 10
ParticipantIDs crossref_citationtrail_10_1109_TCAD_2018_2877014
crossref_primary_10_1109_TCAD_2018_2877014
ieee_primary_8500286
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-Nov.
2019-11-00
PublicationDateYYYYMMDD 2019-11-01
PublicationDate_xml – month: 11
  year: 2019
  text: 2019-Nov.
PublicationDecade 2010
PublicationTitle IEEE transactions on computer-aided design of integrated circuits and systems
PublicationTitleAbbrev TCAD
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
References ref35
ref13
ref34
ref12
ref15
ref36
ref14
ref31
ref30
ref33
ref32
ref10
ref2
ref1
ref17
ref38
ref16
ref19
ref18
huang (ref11) 2009
ref24
ref23
zhuo chen (ref4) 2015
ref26
ref25
ref20
yang (ref37) 2016
ref22
ref21
ref28
ref27
ref29
ref8
ref7
ref9
ref3
ref6
ref5
References_xml – ident: ref38
  doi: 10.1145/1735023.1735063
– ident: ref30
  doi: 10.1109/LCA.2014.2340873
– ident: ref2
  doi: 10.1109/TC.2011.47
– ident: ref8
  doi: 10.1145/2593069.2593199
– ident: ref35
  doi: 10.1109/ISPASS.2008.4510751
– ident: ref18
  doi: 10.1109/MDT.2005.134
– ident: ref26
  doi: 10.1109/ITHERM.2014.6892267
– ident: ref3
  doi: 10.1109/IEDM.2015.7409647
– ident: ref6
  doi: 10.1145/1837274.1837292
– ident: ref34
  doi: 10.1109/ISCA.1995.524546
– ident: ref7
  doi: 10.1109/TCAD.2015.2504875
– ident: ref9
  doi: 10.1007/978-3-642-11515-8_15
– ident: ref25
  doi: 10.1109/TCPMT.2010.2101771
– ident: ref28
  doi: 10.1109/ISVLSI.2012.29
– ident: ref12
  doi: 10.1109/TVLSI.2006.876103
– ident: ref14
  doi: 10.1109/ASPDAC.2012.6165027
– ident: ref15
  doi: 10.1109/JSSC.2010.2040125
– ident: ref20
  doi: 10.1145/1146909.1147160
– ident: ref33
  doi: 10.1145/2370816.2370865
– ident: ref36
  doi: 10.1145/3061639.3062301
– ident: ref16
  doi: 10.3850/9783981537079_0441
– ident: ref22
  doi: 10.1016/j.enbuild.2011.02.007
– ident: ref17
  doi: 10.1145/1669112.1669172
– ident: ref29
  doi: 10.1109/TCSI.2011.2163894
– ident: ref27
  doi: 10.1109/MM.2003.1261391
– ident: ref10
  doi: 10.1145/2024724.2024746
– ident: ref19
  doi: 10.1109/ISCA.2008.15
– start-page: 1373
  year: 2016
  ident: ref37
  article-title: Physical co-design for micro-fluidically cooled 3D ICs
  publication-title: Proc Itherm
– ident: ref24
  doi: 10.1109/TCAD.2014.2360456
– ident: ref1
  doi: 10.1145/1454115.1454128
– ident: ref31
  doi: 10.1109/MM.2005.54
– ident: ref23
  doi: 10.1145/2540708.2540746
– ident: ref32
  doi: 10.1109/ISCA.2008.40
– ident: ref5
  doi: 10.1109/TCAD.2017.2772822
– start-page: 51
  year: 2009
  ident: ref11
  article-title: Lifetime reliability-aware task allocation and scheduling for MPSoC platforms
  publication-title: Proc DATE
– ident: ref13
  doi: 10.1109/MICRO.2006.30
– start-page: 1521
  year: 2015
  ident: ref4
  article-title: Distributed Reinforcement Learning for Power Limited Many-Core System Performance Optimization
  publication-title: Design Automation Test in Europe Conference Exhibition (DATE)
  doi: 10.7873/DATE.2015.0992
– ident: ref21
  doi: 10.1109/TCAD.2017.2666604
SSID ssj0014529
Score 2.2795436
Snippet In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique...
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 2022
SubjectTerms Dynamic reliability management (DRM)
Frequency control
Multicore processing
Phase detection
reinforcement learning
Reliability
Temperature sensors
thermal prediction
Throughput
Title Enhanced Phase-Driven Q -Learning-Based DRM for Multicore Processors
URI https://ieeexplore.ieee.org/document/8500286
Volume 38
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5tT3rwVcX6IgdPYra7afZ1VNtShIpKC70t2eysBWUrdXvx1zvJpksVEW9LSCCbGZL58s18IeQyzXsIK7T0JQ99JkKlGJo5ZKGP0CcVfhpIXZw8fghGU3E_82cNcl3XwgCAST4DR38aLj9bqJW-KutGvoYIQZM0EbhVtVo1Y6AJRHOfohVj0Y8tg-m5cXeCP6WTuCIH4UHoeuLbGbTxqIo5U4a7ZLyeTZVK8uqsytRRnz-EGv873T2yY4NLelN5wz5pQHFAtjckB9ukPyjmhvSnj3M8wFh_qbc7-kSZVVp9YbfYntH-85hiQEtNha7WuqS2pmCx_Dgk0-Fgcjdi9iUFpnjgl0zl4AJ4figlIj7uZkLyKEpVnIs0zwBjmkhTggDKExJixUUWuTLOOdouUjnvHZFWsSjgmFBwMxmLIONSciG1lkwvkFxiK4QYr8kOcddrmygrM65fu3hLDNxw40SbI9HmSKw5OuSqHvJeaWz81bmtV7ruaBf55PfmU7KFg-OqePCMtMrlCs4xiijTC-M-X5UjwXI
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbGOAAHXgMxnjlwQqS0Wfo6Ats0YJ0AbdJuVZqmTAJ1aHQXfj1O21UDIcStspIqta3aju3PAOdR0sKwQkNfMtem3JWSophd6toY-kTcjhyhm5ODgdMb8fuxPa7BZdULo5TKi8-UoR_zXH48lXN9VXbl2TpEcFZgFe2-bRXdWlXOQKcQ8xsVjRmLmlzmMC3TvxriZ-kyLs_AAME1Lf7NCi2NVcmtSncLgsV5imKSV2OeRYb8_AHV-N8Db8Nm6V6S60IfdqCm0l3YWAIdbEC7k07ytD95nKAJo-2Z_uGRJ0JLrNUXeoP0mLSfA4IuLcl7dDXaJSm7Cqazjz0YdTvD2x4tZylQyRw7ozJRplKW7QqBMR8zYy6Y50XST3iUxAq9Gk8nBZWSFhfKl4zHnin8hKH0PJmw1j7U02mqDoAoMxY-d2ImBONCo8m0HMEEUpWLHptogrngbShLoHE97-ItzAMO0w-1OEItjrAURxMuqi3vBcrGX4sbmtPVwpLJh7-Tz2CtNwz6Yf9u8HAE6_giv2glPIZ6NpurE_Qpsug0V6Uv3CjEuw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhanced+Phase-Driven+%24Q%24+-Learning-Based+DRM+for+Multicore+Processors&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Yang%2C+Zhiyuan&rft.au=Serafy%2C+Caleb&rft.au=Lu%2C+Tiantao&rft.au=Srivastava%2C+Ankur&rft.date=2019-11-01&rft.issn=0278-0070&rft.eissn=1937-4151&rft.volume=38&rft.issue=11&rft.spage=2022&rft.epage=2031&rft_id=info:doi/10.1109%2FTCAD.2018.2877014&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCAD_2018_2877014
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon