Enhanced Phase-Driven Q -Learning-Based DRM for Multicore Processors
In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We e...
Saved in:
Published in | IEEE transactions on computer-aided design of integrated circuits and systems Vol. 38; no. 11; pp. 2022 - 2031 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.11.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We employ ON/OFF switching actions and dynamic voltage and frequency scaling as control knobs (i.e., working modes) to tune the state of cores of the processor. In order to achieve this, our technique detects program phases and adaptively determines the optimal working modes for each phase using the Q-learning-based method. By integrating the phase detection into the Q-learning-based management, our technique can provide efficient management for the programs with highly diverse phases. We also propose three additional modules to improve the management efficiency of our technique. In order to evaluate our technique, we use it to manage a 3-D CPU with high-diver programs. Several failure mechanisms are considered in this case study. Our proposed technique is compared with two existing Q-learning-based techniques. The experimental results demonstrate that when the number of phases is smaller than the number of working modes, our technique can achieve more than 1.36× improvement in performance with 60% memory space savings. |
---|---|
AbstractList | In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique considers a wide range of long-term reliability issues and maximizes the throughput of the processor subject to the reliability constraint. We employ ON/OFF switching actions and dynamic voltage and frequency scaling as control knobs (i.e., working modes) to tune the state of cores of the processor. In order to achieve this, our technique detects program phases and adaptively determines the optimal working modes for each phase using the Q-learning-based method. By integrating the phase detection into the Q-learning-based management, our technique can provide efficient management for the programs with highly diverse phases. We also propose three additional modules to improve the management efficiency of our technique. In order to evaluate our technique, we use it to manage a 3-D CPU with high-diver programs. Several failure mechanisms are considered in this case study. Our proposed technique is compared with two existing Q-learning-based techniques. The experimental results demonstrate that when the number of phases is smaller than the number of working modes, our technique can achieve more than 1.36× improvement in performance with 60% memory space savings. |
Author | Serafy, Caleb Lu, Tiantao Srivastava, Ankur Yang, Zhiyuan |
Author_xml | – sequence: 1 givenname: Zhiyuan orcidid: 0000-0002-2250-7959 surname: Yang fullname: Yang, Zhiyuan email: zyyang@umd.edu organization: Electrical and Computer Engineering Department, University of Maryland at College Park, College Park, MD, USA – sequence: 2 givenname: Caleb surname: Serafy fullname: Serafy, Caleb organization: SOC Power Team, Apple, Cupertino, CA, USA – sequence: 3 givenname: Tiantao surname: Lu fullname: Lu, Tiantao organization: ICD Block Implementation Team, Cadence Design Systems Inc., San Jose, CA, USA – sequence: 4 givenname: Ankur surname: Srivastava fullname: Srivastava, Ankur organization: Electrical and Computer Engineering Department, University of Maryland at College Park, College Park, MD, USA |
BookMark | eNp9kM1OAjEUhRuDiYA-gHHTFyjedlraWSKDPwlENLielM6t1GDHtKOJby8E4sKFq5OcnO8svgHpxTYiIZccRpxDeb2aTqqRAG5GwmgNXJ6QPi8LzSRXvEf6ILRhABrOyCDnN9gtlCj7pJrFjY0OG7rc2IysSuELI32ibI42xRBf2c2ub2j1vKC-TXTxue2CaxPSZWod5tymfE5Ovd1mvDjmkLzczlbTezZ_vHuYTubMibHqmPMIiFxpa7XgAhpphTFrV3q59g0qrgyALBAdlxZLJ2RjwJZeaOeM86IYEn34danNOaGvXehsF9rYJRu2NYd6L6Pey6j3MuqjjB3J_5AfKbzb9P0vc3VgAiL-7o0CEGZc_ABS22ww |
CODEN | ITCSDI |
CitedBy_id | crossref_primary_10_1109_TCAD_2022_3158832 |
Cites_doi | 10.1145/1735023.1735063 10.1109/LCA.2014.2340873 10.1109/TC.2011.47 10.1145/2593069.2593199 10.1109/ISPASS.2008.4510751 10.1109/MDT.2005.134 10.1109/ITHERM.2014.6892267 10.1109/IEDM.2015.7409647 10.1145/1837274.1837292 10.1109/ISCA.1995.524546 10.1109/TCAD.2015.2504875 10.1007/978-3-642-11515-8_15 10.1109/TCPMT.2010.2101771 10.1109/ISVLSI.2012.29 10.1109/TVLSI.2006.876103 10.1109/ASPDAC.2012.6165027 10.1109/JSSC.2010.2040125 10.1145/1146909.1147160 10.1145/2370816.2370865 10.1145/3061639.3062301 10.3850/9783981537079_0441 10.1016/j.enbuild.2011.02.007 10.1145/1669112.1669172 10.1109/TCSI.2011.2163894 10.1109/MM.2003.1261391 10.1145/2024724.2024746 10.1109/ISCA.2008.15 10.1109/TCAD.2014.2360456 10.1145/1454115.1454128 10.1109/MM.2005.54 10.1145/2540708.2540746 10.1109/ISCA.2008.40 10.1109/TCAD.2017.2772822 10.1109/MICRO.2006.30 10.7873/DATE.2015.0992 10.1109/TCAD.2017.2666604 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TCAD.2018.2877014 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1937-4151 |
EndPage | 2031 |
ExternalDocumentID | 10_1109_TCAD_2018_2877014 8500286 |
Genre | orig-research |
GroupedDBID | --Z -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PZZ RIA RIE RNS TN5 VH1 VJK AAYXX CITATION RIG |
ID | FETCH-LOGICAL-c265t-cfe0ee157aa72120d4a288bc9f4bfde51580043eec14ae9c24d80a9f27cc8cf23 |
IEDL.DBID | RIE |
ISSN | 0278-0070 |
IngestDate | Tue Jul 01 00:30:50 EDT 2025 Thu Apr 24 22:55:42 EDT 2025 Wed Aug 27 02:43:04 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c265t-cfe0ee157aa72120d4a288bc9f4bfde51580043eec14ae9c24d80a9f27cc8cf23 |
ORCID | 0000-0002-2250-7959 |
PageCount | 10 |
ParticipantIDs | crossref_citationtrail_10_1109_TCAD_2018_2877014 crossref_primary_10_1109_TCAD_2018_2877014 ieee_primary_8500286 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2019-Nov. 2019-11-00 |
PublicationDateYYYYMMDD | 2019-11-01 |
PublicationDate_xml | – month: 11 year: 2019 text: 2019-Nov. |
PublicationDecade | 2010 |
PublicationTitle | IEEE transactions on computer-aided design of integrated circuits and systems |
PublicationTitleAbbrev | TCAD |
PublicationYear | 2019 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref35 ref13 ref34 ref12 ref15 ref36 ref14 ref31 ref30 ref33 ref32 ref10 ref2 ref1 ref17 ref38 ref16 ref19 ref18 huang (ref11) 2009 ref24 ref23 zhuo chen (ref4) 2015 ref26 ref25 ref20 yang (ref37) 2016 ref22 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref3 ref6 ref5 |
References_xml | – ident: ref38 doi: 10.1145/1735023.1735063 – ident: ref30 doi: 10.1109/LCA.2014.2340873 – ident: ref2 doi: 10.1109/TC.2011.47 – ident: ref8 doi: 10.1145/2593069.2593199 – ident: ref35 doi: 10.1109/ISPASS.2008.4510751 – ident: ref18 doi: 10.1109/MDT.2005.134 – ident: ref26 doi: 10.1109/ITHERM.2014.6892267 – ident: ref3 doi: 10.1109/IEDM.2015.7409647 – ident: ref6 doi: 10.1145/1837274.1837292 – ident: ref34 doi: 10.1109/ISCA.1995.524546 – ident: ref7 doi: 10.1109/TCAD.2015.2504875 – ident: ref9 doi: 10.1007/978-3-642-11515-8_15 – ident: ref25 doi: 10.1109/TCPMT.2010.2101771 – ident: ref28 doi: 10.1109/ISVLSI.2012.29 – ident: ref12 doi: 10.1109/TVLSI.2006.876103 – ident: ref14 doi: 10.1109/ASPDAC.2012.6165027 – ident: ref15 doi: 10.1109/JSSC.2010.2040125 – ident: ref20 doi: 10.1145/1146909.1147160 – ident: ref33 doi: 10.1145/2370816.2370865 – ident: ref36 doi: 10.1145/3061639.3062301 – ident: ref16 doi: 10.3850/9783981537079_0441 – ident: ref22 doi: 10.1016/j.enbuild.2011.02.007 – ident: ref17 doi: 10.1145/1669112.1669172 – ident: ref29 doi: 10.1109/TCSI.2011.2163894 – ident: ref27 doi: 10.1109/MM.2003.1261391 – ident: ref10 doi: 10.1145/2024724.2024746 – ident: ref19 doi: 10.1109/ISCA.2008.15 – start-page: 1373 year: 2016 ident: ref37 article-title: Physical co-design for micro-fluidically cooled 3D ICs publication-title: Proc Itherm – ident: ref24 doi: 10.1109/TCAD.2014.2360456 – ident: ref1 doi: 10.1145/1454115.1454128 – ident: ref31 doi: 10.1109/MM.2005.54 – ident: ref23 doi: 10.1145/2540708.2540746 – ident: ref32 doi: 10.1109/ISCA.2008.40 – ident: ref5 doi: 10.1109/TCAD.2017.2772822 – start-page: 51 year: 2009 ident: ref11 article-title: Lifetime reliability-aware task allocation and scheduling for MPSoC platforms publication-title: Proc DATE – ident: ref13 doi: 10.1109/MICRO.2006.30 – start-page: 1521 year: 2015 ident: ref4 article-title: Distributed Reinforcement Learning for Power Limited Many-Core System Performance Optimization publication-title: Design Automation Test in Europe Conference Exhibition (DATE) doi: 10.7873/DATE.2015.0992 – ident: ref21 doi: 10.1109/TCAD.2017.2666604 |
SSID | ssj0014529 |
Score | 2.2795436 |
Snippet | In this paper, we propose a new dynamic reliability management technique for multicore processors using phase-driven Q-learning-based method. Our technique... |
SourceID | crossref ieee |
SourceType | Enrichment Source Index Database Publisher |
StartPage | 2022 |
SubjectTerms | Dynamic reliability management (DRM) Frequency control Multicore processing Phase detection reinforcement learning Reliability Temperature sensors thermal prediction Throughput |
Title | Enhanced Phase-Driven Q -Learning-Based DRM for Multicore Processors |
URI | https://ieeexplore.ieee.org/document/8500286 |
Volume | 38 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5tT3rwVcX6IgdPYra7afZ1VNtShIpKC70t2eysBWUrdXvx1zvJpksVEW9LSCCbGZL58s18IeQyzXsIK7T0JQ99JkKlGJo5ZKGP0CcVfhpIXZw8fghGU3E_82cNcl3XwgCAST4DR38aLj9bqJW-KutGvoYIQZM0EbhVtVo1Y6AJRHOfohVj0Y8tg-m5cXeCP6WTuCIH4UHoeuLbGbTxqIo5U4a7ZLyeTZVK8uqsytRRnz-EGv873T2yY4NLelN5wz5pQHFAtjckB9ukPyjmhvSnj3M8wFh_qbc7-kSZVVp9YbfYntH-85hiQEtNha7WuqS2pmCx_Dgk0-Fgcjdi9iUFpnjgl0zl4AJ4figlIj7uZkLyKEpVnIs0zwBjmkhTggDKExJixUUWuTLOOdouUjnvHZFWsSjgmFBwMxmLIONSciG1lkwvkFxiK4QYr8kOcddrmygrM65fu3hLDNxw40SbI9HmSKw5OuSqHvJeaWz81bmtV7ruaBf55PfmU7KFg-OqePCMtMrlCs4xiijTC-M-X5UjwXI |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbGOAAHXgMxnjlwQqS0Wfo6Ats0YJ0AbdJuVZqmTAJ1aHQXfj1O21UDIcStspIqta3aju3PAOdR0sKwQkNfMtem3JWSophd6toY-kTcjhyhm5ODgdMb8fuxPa7BZdULo5TKi8-UoR_zXH48lXN9VXbl2TpEcFZgFe2-bRXdWlXOQKcQ8xsVjRmLmlzmMC3TvxriZ-kyLs_AAME1Lf7NCi2NVcmtSncLgsV5imKSV2OeRYb8_AHV-N8Db8Nm6V6S60IfdqCm0l3YWAIdbEC7k07ytD95nKAJo-2Z_uGRJ0JLrNUXeoP0mLSfA4IuLcl7dDXaJSm7Cqazjz0YdTvD2x4tZylQyRw7ozJRplKW7QqBMR8zYy6Y50XST3iUxAq9Gk8nBZWSFhfKl4zHnin8hKH0PJmw1j7U02mqDoAoMxY-d2ImBONCo8m0HMEEUpWLHptogrngbShLoHE97-ItzAMO0w-1OEItjrAURxMuqi3vBcrGX4sbmtPVwpLJh7-Tz2CtNwz6Yf9u8HAE6_giv2glPIZ6NpurE_Qpsug0V6Uv3CjEuw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhanced+Phase-Driven+%24Q%24+-Learning-Based+DRM+for+Multicore+Processors&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Yang%2C+Zhiyuan&rft.au=Serafy%2C+Caleb&rft.au=Lu%2C+Tiantao&rft.au=Srivastava%2C+Ankur&rft.date=2019-11-01&rft.issn=0278-0070&rft.eissn=1937-4151&rft.volume=38&rft.issue=11&rft.spage=2022&rft.epage=2031&rft_id=info:doi/10.1109%2FTCAD.2018.2877014&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCAD_2018_2877014 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon |