Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs

The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRA...

Full description

Saved in:
Bibliographic Details
Published inIEICE Transactions on Information and Systems Vol. E101.D; no. 9; pp. 2247 - 2257
Main Authors IMAMURA, Satoshi, YASUI, Yuichiro, INOUE, Koji, ONO, Takatsugu, SASAKI, Hiroshi, FUJISAWA, Katsuki
Format Journal Article
LanguageEnglish
Published Tokyo The Institute of Electronics, Information and Communication Engineers 01.09.2018
Japan Science and Technology Agency
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality(RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.
AbstractList The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality(RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.
Author ONO, Takatsugu
YASUI, Yuichiro
INOUE, Koji
SASAKI, Hiroshi
IMAMURA, Satoshi
FUJISAWA, Katsuki
Author_xml – sequence: 1
  fullname: IMAMURA, Satoshi
  organization: Fujitsu Laboratories Ltd
– sequence: 2
  fullname: YASUI, Yuichiro
  organization: Institute of Mathematics for Industry, Kyushu University
– sequence: 3
  fullname: INOUE, Koji
  organization: Faculty of Information Science and Electrical Engineering, Kyushu University
– sequence: 4
  fullname: ONO, Takatsugu
  organization: Faculty of Information Science and Electrical Engineering, Kyushu University
– sequence: 5
  fullname: SASAKI, Hiroshi
  organization: Department of Computer Science, Columbia University
– sequence: 6
  fullname: FUJISAWA, Katsuki
  organization: Institute of Mathematics for Industry, Kyushu University
BookMark eNpNkFtLw0AQhRepYK3-Ax8CPqfuJdkkj9JGLbQoXt6EZbKZbVPSje5uhf57I7XVpzMD5zsznHMysJ1FQq4YHbM0z26CA-sba8acsqycPmW8kCdkyLIkjZmQbECGtGAyzlPBz8i592tKWc5ZOiTv5Re0WwiNXUalRbfcxaUxjW7Q6l3UmWj6fLuIJiuwFttoZgO6FuHrx_6iV7hBH5nORYttG5qwcgg11tGT65YONv6CnBpoPV7-6oi83ZWvk4d4_ng_m9zOY51KGuLC9FheYCUg1WCQiYr2b2smeb_kWlMoeJb1IjNDdQ0pQpLKRHJWCV5XYkSu97kfrvvcog9q3W2d7U8qLqjIqcxy3ruSvUu7znuHRn24ZgNupxhVPz2qQ4_qX4899rzH1j7AEo8QuNDoFv-gklGmpqo4DP9Cjma9AqfQim95EYiO
Cites_doi 10.1145/360128.360134
10.1145/2155620.2155624
10.1109/ASAP.2014.6868669
10.1109/MICRO.2010.51
10.1145/2451116.2451137
10.1145/1736020.1736045
10.1145/2024724.2024954
10.1145/339647.339668
10.1109/HPGDMP.2016.010
10.1145/2370816.2370869
10.1109/ISLPED.2011.5993649
10.1145/2989081.2989131
10.1145/2145816.2145840
10.1109/MICRO.2016.7783760
10.1145/2513228.2513306
10.1109/L-CA.2011.4
10.1109/HPCA.2014.6835945
10.1109/HPCA.2012.6168944
10.1109/MICRO.2007.4408252
10.1109/ICPPW.2010.38
10.1145/2155620.2155664
10.1145/1840845.1840883
10.1145/2915516.2915522
ContentType Journal Article
Copyright 2018 The Institute of Electronics, Information and Communication Engineers
Copyright Japan Science and Technology Agency 2018
Copyright_xml – notice: 2018 The Institute of Electronics, Information and Communication Engineers
– notice: Copyright Japan Science and Technology Agency 2018
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1587/transinf.2017EDP7296
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1745-1361
EndPage 2257
ExternalDocumentID 10_1587_transinf_2017EDP7296
article_transinf_E101_D_9_E101_D_2017EDP7296_article_char_en
GroupedDBID -~X
5GY
ABZEH
ACGFS
ADNWM
AENEX
ALMA_UNASSIGNED_HOLDINGS
CS3
DU5
EBS
EJD
F5P
ICE
JSF
JSH
KQ8
OK1
P2P
RJT
RZJ
TN5
TQK
ZKX
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c560t-9fead89eb3a5cafe13b0136c162fe18cc0a9277c0a67f0cda5ea4564621b32db3
ISSN 0916-8532
IngestDate Thu Oct 10 19:02:16 EDT 2024
Fri Aug 23 02:39:08 EDT 2024
Wed Apr 05 03:41:32 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c560t-9fead89eb3a5cafe13b0136c162fe18cc0a9277c0a67f0cda5ea4564621b32db3
OpenAccessLink https://www.jstage.jst.go.jp/article/transinf/E101.D/9/E101.D_2017EDP7296/_article/-char/en
PQID 2303806782
PQPubID 2048497
PageCount 11
ParticipantIDs proquest_journals_2303806782
crossref_primary_10_1587_transinf_2017EDP7296
jstage_primary_article_transinf_E101_D_9_E101_D_2017EDP7296_article_char_en
PublicationCentury 2000
PublicationDate 2018-09-01
PublicationDateYYYYMMDD 2018-09-01
PublicationDate_xml – month: 09
  year: 2018
  text: 2018-09-01
  day: 01
PublicationDecade 2010
PublicationPlace Tokyo
PublicationPlace_xml – name: Tokyo
PublicationTitle IEICE Transactions on Information and Systems
PublicationTitleAlternate IEICE Trans. Inf. & Syst.
PublicationYear 2018
Publisher The Institute of Electronics, Information and Communication Engineers
Japan Science and Technology Agency
Publisher_xml – name: The Institute of Electronics, Information and Communication Engineers
– name: Japan Science and Technology Agency
References [7] Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO-43, pp.65-76, 2010. 10.1109/micro.2010.51
[4] D. Kaseridis, J. Stuecheli, and L.K. John, “Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era,” MICRO-44, pp.24-35, 2011. 10.1145/2155620.2155624
[15] O. Mutlu, “Computer Architecture: Main Memory (Alternate Version).” http://slideplayer.com/slide/4744474/. Last accessed on March 5, 2018.
[29] Micron, “DDR4-Advantages of Migrating from DDR3.” https://www.micron.com/products/dram/ddr3-to-ddr4. Last accessed on March 7, 2018.
[3] Z. Zhang, Z. Zhu, and X. Zhang, “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” MICRO-33, pp.32-41, 2000. 10.1145/360128.360134
[11] M. Xie, D. Tong, K. Huang, and X. Cheng, “Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning,” HPCA '14, pp.344-355, 2014. 10.1109/hpca.2014.6835945
[25] R.C. Murphy, K.B. Wheeler, B.W. Barrett, and J.A. Ang, Introducing the Graph 500. Cray User's Group (CUG), 2010.
[27] Micron Technology, Inc., Calculating Memory System Power for DDR3, 2007.
[6] O. Mutlu and T. Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO-40, pp.146-160, 2007. 10.1109/micro.2007.4408252
[31] C. Stephan, “Quantifying the Power Savings by Upgrading to DDR4 Memory on Lenovo Servers,” LENOVO PRESS, 2016.
[17] AMD, “BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors,” 2013. Rev 3.14.
[8] S.P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning,” MICRO-44, pp.374-385, 2011. 10.1145/2155620.2155664
[23] P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A Cycle Accurate Memory System Simulator,” Computer Architecture Letters, vol.10, no.1, pp.16-19, Jan. 2011. 10.1109/l-ca.2011.4
[2] K. Kumar, K. Doshi, M. Dimitrov, and Y.-H. Lu, “Memory Energy Management for an Enterprise Decision Support System,” ISLPED '11, pp.277-282, 2011. 10.1109/islped.2011.5993649
[12] H. Park, S. Baek, J. Choi, D. Lee, and S.H. Noh, “Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-core, Multi-bank Systems,” ASPLOS '13, pp.181-192, 2013. 10.1145/2451116.2451137
[33] Micron, “4Gb: x4, x8, x16 DDR4 SDRAM features,” 2014.
[13] X. Tang, M. Kandemir, P. Yedlapalli, and J. Kotra, “Improving Bank-Level Parallelism for Irregular Applications,” MICRO-49, pp.1-12, 2016. 10.1109/micro.2016.7783760
[28] Intel, “Intel Xeon Processor E5 and E7 v3 Family Uncore Performance Monitoring Reference Manual,” June 2015.
[35] J. Treibig, G. Hager, and G. Wellein, “LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments,” ICPPW '10, pp.207-216, 2010. 10.1109/icppw.2010.38
[9] L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu, “A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems,” PACT '12, pp.367-376, 2012. 10.1145/2370816.2370869
[5] S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, and J.D. Owens, “Memory Access Scheduling,” ISCA '00, pp.128-138, 2000. 10.1145/339647.339668
[16] S. Imamura, Y. Yasui, K. Inoue, T. Ono, H. Sasaki, and K.Fujisawa, “Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping,” HPGDMP '16, pp.17-24, 2016. 10.1109/hpgdmp.2016.010
[24] G.E. Blelloch, J.T. Fineman, P.B. Gibbons, and J. Shun, “Internally Deterministic Parallel Algorithms Can Be Fast,” PPoPP '12, pp.181-192, 2012. 10.1145/2145816.2145840
[1] Intel, “Intel® Xeon® Processor E7-8890 v4.” https://ark.intel.com/products/93790/Intel-Xeon-Processor-E7-8890-v4-60M-Cache-2_20-GHz. Last accessed: March 5, 2018.
[26] Y. Yasui, K. Fujisawa, E.L. Goh, J. Baron, A. Sugiura, and T. Uchiyama, “NUMA-aware Scalable Graph Traversal on SGI UV Systems,” HPGP '16, pp.19-26, 2016. 10.1145/2915516.2915522
[34] H. David, E. Gorbatov, U.R. Hanebutte, R. Khanaa, and C. Le, “RAPL: Memory Power Estimation and Capping,” ISLPED '10, pp.189-194, 2010. 10.1145/1840845.1840883
[22] A. Patel, F. Afram, S. Chen, and K. Ghose, “MARSS: A Full System Simulator for Multicore x86 CPUs,” DAC '11, pp.1050-1055, 2011. 10.1145/2024724.2024954
[32] Micron, “2Gb: x4, x8, x16 DDR3 SDRAM features,” 2006.
[21] M. Jung, D.M. Mathew, C. Weis, N. Wehn, I. Heinrich, M.V.Natale, and S.O. Krumke, “ConGen: An Application Specific DRAM Memory Controller Generator,” MEMSYS '16, pp.257-267, 2016. 10.1145/2989081.2989131
[30] Micron, “DRAM Memory In High-Speed Digital Designs.” https://www.keysight.com/upload/cmc_upload/All/5Micron.pdf. Last accessed on March 1, 2018.
[14] B. Akin, F. Franchetti, and J.C. Hoe, “Understanding the Design Space of DRAM-Optimized Hardware FFT Accelerators,” ASAP '14, pp.248-255, 2014. 10.1109/asap.2014.6868669
[18] K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R.Balasubramonian, and A. Davis, “Micro-pages: Increasing DRAM Efficiency with Locality-Aware Data Placement,” ASPLOS '10, pp.219-230, 2010.
[10] M.K. Jeong, D.H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, “Balancing DRAM locality and parallelism in shared memory CMP systems,” HPCA '12, pp.1-12, 2012. 10.1109/hpca.2012.6168944
[19] D. Kang, H. Park, and J. Choi, “Effect of Page Frame Allocation Pattern on Bank Conflicts in Multi-core Systems,” RACS '13, pp.467-472, 2013. 10.1145/2513228.2513306
[20] P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard, “DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks,” USENIX Security '16, pp.565-581, 2016.
22
23
24
25
26
27
28
29
30
31
10
32
11
33
12
34
13
35
14
15
16
17
18
19
1
2
3
4
5
6
7
8
9
20
21
References_xml – ident: 3
  doi: 10.1145/360128.360134
– ident: 4
  doi: 10.1145/2155620.2155624
– ident: 14
  doi: 10.1109/ASAP.2014.6868669
– ident: 7
  doi: 10.1109/MICRO.2010.51
– ident: 12
  doi: 10.1145/2451116.2451137
– ident: 18
  doi: 10.1145/1736020.1736045
– ident: 22
  doi: 10.1145/2024724.2024954
– ident: 5
  doi: 10.1145/339647.339668
– ident: 16
  doi: 10.1109/HPGDMP.2016.010
– ident: 9
  doi: 10.1145/2370816.2370869
– ident: 33
– ident: 31
– ident: 2
  doi: 10.1109/ISLPED.2011.5993649
– ident: 28
– ident: 21
  doi: 10.1145/2989081.2989131
– ident: 24
  doi: 10.1145/2145816.2145840
– ident: 20
– ident: 13
  doi: 10.1109/MICRO.2016.7783760
– ident: 17
– ident: 19
  doi: 10.1145/2513228.2513306
– ident: 23
  doi: 10.1109/L-CA.2011.4
– ident: 11
  doi: 10.1109/HPCA.2014.6835945
– ident: 10
  doi: 10.1109/HPCA.2012.6168944
– ident: 1
– ident: 6
  doi: 10.1109/MICRO.2007.4408252
– ident: 15
– ident: 32
– ident: 29
– ident: 30
– ident: 35
  doi: 10.1109/ICPPW.2010.38
– ident: 8
  doi: 10.1145/2155620.2155664
– ident: 34
  doi: 10.1145/1840845.1840883
– ident: 27
– ident: 25
– ident: 26
  doi: 10.1145/2915516.2915522
SSID ssj0018215
Score 2.1999733
Snippet The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of...
SourceID proquest
crossref
jstage
SourceType Aggregation Database
Publisher
StartPage 2247
SubjectTerms address mapping schemes
Benchmarks
DRAM
Energy consumption
energy efficiency
Energy management
Power consumption
Power management
Title Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs
URI https://www.jstage.jst.go.jp/article/transinf/E101.D/9/E101.D_2017EDP7296/_article/-char/en
https://www.proquest.com/docview/2303806782
Volume E101.D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX IEICE Transactions on Information and Systems, 2018/09/01, Vol.E101.D(9), pp.2247-2257
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9NAEF6FwoEeeBQqAgXtgauLvX7uMWpcNURJoDRSD0jW2l63KVVc1faFX8VPZPblbGmEKBfHXq1Xzs7nb2fGM7MIfaS0LANYF53YDcBAAZ3XyUmeOxx067BkfhD5Ijl5No9OlsHn8_B8MPhlRS11bX5Y_NyaV_I_UoU2kKvIkn2AZPtBoQHOQb5wBAnD8Z9knOpS3WDtpzKHz0llRQiZTgla4Ph0NJP5A2t-rXx_15xJD8I3kJUo0ySCDFUOLsiUlVxmDoiArcbWWifp5CgVu0mYrcXlNwZdc7U1Ac2NVfxcgG02mi1PR8rt3NbN5apnGODyieT-blVcrm7r_pb5YimjM6f1Vd97MV9IULEfrG26i872U3hJH4jVOxy9yAHlQHEvV3QbB6Hj-aocu-HjFFjicGxhj9oES1SBTr1YAxvFWxeCULhSjuXEQLsI4YvT8RewJLbU3f5jPeyjFIV9BONkZpTMGuURekyA2gSnTr9uvlslRO2ZYf6rTtaEUT5te5Y7ytCTK7AHLu4rBVLTOXuBnmkTBY8U3l6iAV_voedm-w-sV4M9tGvVsnyFvm_AiO-BEdcVFmDEGozYBiPWYMQAJ3wHjNiA8TVaHqdnRyeO3rrDKUCFbh1aQbeE8txnYcEq7vnC3x4VXkTgIikKl1ESx_ATxZVblCzkTBQ2ioiX-6TM_X20s67X_A3CPvOIm9CorBgNgtJloIETmlcwi34B1tYQOWYOsxtVoSX7m-SGaKomuu-t399Nb4HAbJxRc2Ld3XcWSZHAQUN0YKSVaWZoMjDr_USogeTtAx_uHXq6eXcO0E572_H3oPW2-QcJtN-zra9r
link.rule.ids 315,783,787,27936,27937
linkProvider Colorado Alliance of Research Libraries
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluating+Energy-Efficiency+of+DRAM+Channel+Interleaving+Schemes+for+Multithreaded+Programs&rft.jtitle=IEICE+transactions+on+information+and+systems&rft.au=IMAMURA%2C+Satoshi&rft.au=YASUI%2C+Yuichiro&rft.au=INOUE%2C+Koji&rft.au=ONO%2C+Takatsugu&rft.date=2018-09-01&rft.issn=0916-8532&rft.eissn=1745-1361&rft.volume=E101.D&rft.issue=9&rft.spage=2247&rft.epage=2257&rft_id=info:doi/10.1587%2Ftransinf.2017EDP7296&rft.externalDBID=n%2Fa&rft.externalDocID=10_1587_transinf_2017EDP7296
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0916-8532&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0916-8532&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0916-8532&client=summon