Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs
The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRA...
Saved in:
Published in | IEICE Transactions on Information and Systems Vol. E101.D; no. 9; pp. 2247 - 2257 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Tokyo
The Institute of Electronics, Information and Communication Engineers
01.09.2018
Japan Science and Technology Agency |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality(RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively. |
---|---|
AbstractList | The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality(RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively. |
Author | ONO, Takatsugu YASUI, Yuichiro INOUE, Koji SASAKI, Hiroshi IMAMURA, Satoshi FUJISAWA, Katsuki |
Author_xml | – sequence: 1 fullname: IMAMURA, Satoshi organization: Fujitsu Laboratories Ltd – sequence: 2 fullname: YASUI, Yuichiro organization: Institute of Mathematics for Industry, Kyushu University – sequence: 3 fullname: INOUE, Koji organization: Faculty of Information Science and Electrical Engineering, Kyushu University – sequence: 4 fullname: ONO, Takatsugu organization: Faculty of Information Science and Electrical Engineering, Kyushu University – sequence: 5 fullname: SASAKI, Hiroshi organization: Department of Computer Science, Columbia University – sequence: 6 fullname: FUJISAWA, Katsuki organization: Institute of Mathematics for Industry, Kyushu University |
BookMark | eNpNkFtLw0AQhRepYK3-Ax8CPqfuJdkkj9JGLbQoXt6EZbKZbVPSje5uhf57I7XVpzMD5zsznHMysJ1FQq4YHbM0z26CA-sba8acsqycPmW8kCdkyLIkjZmQbECGtGAyzlPBz8i592tKWc5ZOiTv5Re0WwiNXUalRbfcxaUxjW7Q6l3UmWj6fLuIJiuwFttoZgO6FuHrx_6iV7hBH5nORYttG5qwcgg11tGT65YONv6CnBpoPV7-6oi83ZWvk4d4_ng_m9zOY51KGuLC9FheYCUg1WCQiYr2b2smeb_kWlMoeJb1IjNDdQ0pQpLKRHJWCV5XYkSu97kfrvvcog9q3W2d7U8qLqjIqcxy3ruSvUu7znuHRn24ZgNupxhVPz2qQ4_qX4899rzH1j7AEo8QuNDoFv-gklGmpqo4DP9Cjma9AqfQim95EYiO |
Cites_doi | 10.1145/360128.360134 10.1145/2155620.2155624 10.1109/ASAP.2014.6868669 10.1109/MICRO.2010.51 10.1145/2451116.2451137 10.1145/1736020.1736045 10.1145/2024724.2024954 10.1145/339647.339668 10.1109/HPGDMP.2016.010 10.1145/2370816.2370869 10.1109/ISLPED.2011.5993649 10.1145/2989081.2989131 10.1145/2145816.2145840 10.1109/MICRO.2016.7783760 10.1145/2513228.2513306 10.1109/L-CA.2011.4 10.1109/HPCA.2014.6835945 10.1109/HPCA.2012.6168944 10.1109/MICRO.2007.4408252 10.1109/ICPPW.2010.38 10.1145/2155620.2155664 10.1145/1840845.1840883 10.1145/2915516.2915522 |
ContentType | Journal Article |
Copyright | 2018 The Institute of Electronics, Information and Communication Engineers Copyright Japan Science and Technology Agency 2018 |
Copyright_xml | – notice: 2018 The Institute of Electronics, Information and Communication Engineers – notice: Copyright Japan Science and Technology Agency 2018 |
DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
DOI | 10.1587/transinf.2017EDP7296 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1745-1361 |
EndPage | 2257 |
ExternalDocumentID | 10_1587_transinf_2017EDP7296 article_transinf_E101_D_9_E101_D_2017EDP7296_article_char_en |
GroupedDBID | -~X 5GY ABZEH ACGFS ADNWM AENEX ALMA_UNASSIGNED_HOLDINGS CS3 DU5 EBS EJD F5P ICE JSF JSH KQ8 OK1 P2P RJT RZJ TN5 TQK ZKX AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c560t-9fead89eb3a5cafe13b0136c162fe18cc0a9277c0a67f0cda5ea4564621b32db3 |
ISSN | 0916-8532 |
IngestDate | Thu Oct 10 19:02:16 EDT 2024 Fri Aug 23 02:39:08 EDT 2024 Wed Apr 05 03:41:32 EDT 2023 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 9 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c560t-9fead89eb3a5cafe13b0136c162fe18cc0a9277c0a67f0cda5ea4564621b32db3 |
OpenAccessLink | https://www.jstage.jst.go.jp/article/transinf/E101.D/9/E101.D_2017EDP7296/_article/-char/en |
PQID | 2303806782 |
PQPubID | 2048497 |
PageCount | 11 |
ParticipantIDs | proquest_journals_2303806782 crossref_primary_10_1587_transinf_2017EDP7296 jstage_primary_article_transinf_E101_D_9_E101_D_2017EDP7296_article_char_en |
PublicationCentury | 2000 |
PublicationDate | 2018-09-01 |
PublicationDateYYYYMMDD | 2018-09-01 |
PublicationDate_xml | – month: 09 year: 2018 text: 2018-09-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Tokyo |
PublicationPlace_xml | – name: Tokyo |
PublicationTitle | IEICE Transactions on Information and Systems |
PublicationTitleAlternate | IEICE Trans. Inf. & Syst. |
PublicationYear | 2018 |
Publisher | The Institute of Electronics, Information and Communication Engineers Japan Science and Technology Agency |
Publisher_xml | – name: The Institute of Electronics, Information and Communication Engineers – name: Japan Science and Technology Agency |
References | [7] Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO-43, pp.65-76, 2010. 10.1109/micro.2010.51 [4] D. Kaseridis, J. Stuecheli, and L.K. John, “Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era,” MICRO-44, pp.24-35, 2011. 10.1145/2155620.2155624 [15] O. Mutlu, “Computer Architecture: Main Memory (Alternate Version).” http://slideplayer.com/slide/4744474/. Last accessed on March 5, 2018. [29] Micron, “DDR4-Advantages of Migrating from DDR3.” https://www.micron.com/products/dram/ddr3-to-ddr4. Last accessed on March 7, 2018. [3] Z. Zhang, Z. Zhu, and X. Zhang, “A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality,” MICRO-33, pp.32-41, 2000. 10.1145/360128.360134 [11] M. Xie, D. Tong, K. Huang, and X. Cheng, “Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning,” HPCA '14, pp.344-355, 2014. 10.1109/hpca.2014.6835945 [25] R.C. Murphy, K.B. Wheeler, B.W. Barrett, and J.A. Ang, Introducing the Graph 500. Cray User's Group (CUG), 2010. [27] Micron Technology, Inc., Calculating Memory System Power for DDR3, 2007. [6] O. Mutlu and T. Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO-40, pp.146-160, 2007. 10.1109/micro.2007.4408252 [31] C. Stephan, “Quantifying the Power Savings by Upgrading to DDR4 Memory on Lenovo Servers,” LENOVO PRESS, 2016. [17] AMD, “BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors,” 2013. Rev 3.14. [8] S.P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning,” MICRO-44, pp.374-385, 2011. 10.1145/2155620.2155664 [23] P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A Cycle Accurate Memory System Simulator,” Computer Architecture Letters, vol.10, no.1, pp.16-19, Jan. 2011. 10.1109/l-ca.2011.4 [2] K. Kumar, K. Doshi, M. Dimitrov, and Y.-H. Lu, “Memory Energy Management for an Enterprise Decision Support System,” ISLPED '11, pp.277-282, 2011. 10.1109/islped.2011.5993649 [12] H. Park, S. Baek, J. Choi, D. Lee, and S.H. Noh, “Regularities Considered Harmful: Forcing Randomness to Memory Accesses to Reduce Row Buffer Conflicts for Multi-core, Multi-bank Systems,” ASPLOS '13, pp.181-192, 2013. 10.1145/2451116.2451137 [33] Micron, “4Gb: x4, x8, x16 DDR4 SDRAM features,” 2014. [13] X. Tang, M. Kandemir, P. Yedlapalli, and J. Kotra, “Improving Bank-Level Parallelism for Irregular Applications,” MICRO-49, pp.1-12, 2016. 10.1109/micro.2016.7783760 [28] Intel, “Intel Xeon Processor E5 and E7 v3 Family Uncore Performance Monitoring Reference Manual,” June 2015. [35] J. Treibig, G. Hager, and G. Wellein, “LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments,” ICPPW '10, pp.207-216, 2010. 10.1109/icppw.2010.38 [9] L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu, “A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems,” PACT '12, pp.367-376, 2012. 10.1145/2370816.2370869 [5] S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, and J.D. Owens, “Memory Access Scheduling,” ISCA '00, pp.128-138, 2000. 10.1145/339647.339668 [16] S. Imamura, Y. Yasui, K. Inoue, T. Ono, H. Sasaki, and K.Fujisawa, “Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping,” HPGDMP '16, pp.17-24, 2016. 10.1109/hpgdmp.2016.010 [24] G.E. Blelloch, J.T. Fineman, P.B. Gibbons, and J. Shun, “Internally Deterministic Parallel Algorithms Can Be Fast,” PPoPP '12, pp.181-192, 2012. 10.1145/2145816.2145840 [1] Intel, “Intel® Xeon® Processor E7-8890 v4.” https://ark.intel.com/products/93790/Intel-Xeon-Processor-E7-8890-v4-60M-Cache-2_20-GHz. Last accessed: March 5, 2018. [26] Y. Yasui, K. Fujisawa, E.L. Goh, J. Baron, A. Sugiura, and T. Uchiyama, “NUMA-aware Scalable Graph Traversal on SGI UV Systems,” HPGP '16, pp.19-26, 2016. 10.1145/2915516.2915522 [34] H. David, E. Gorbatov, U.R. Hanebutte, R. Khanaa, and C. Le, “RAPL: Memory Power Estimation and Capping,” ISLPED '10, pp.189-194, 2010. 10.1145/1840845.1840883 [22] A. Patel, F. Afram, S. Chen, and K. Ghose, “MARSS: A Full System Simulator for Multicore x86 CPUs,” DAC '11, pp.1050-1055, 2011. 10.1145/2024724.2024954 [32] Micron, “2Gb: x4, x8, x16 DDR3 SDRAM features,” 2006. [21] M. Jung, D.M. Mathew, C. Weis, N. Wehn, I. Heinrich, M.V.Natale, and S.O. Krumke, “ConGen: An Application Specific DRAM Memory Controller Generator,” MEMSYS '16, pp.257-267, 2016. 10.1145/2989081.2989131 [30] Micron, “DRAM Memory In High-Speed Digital Designs.” https://www.keysight.com/upload/cmc_upload/All/5Micron.pdf. Last accessed on March 1, 2018. [14] B. Akin, F. Franchetti, and J.C. Hoe, “Understanding the Design Space of DRAM-Optimized Hardware FFT Accelerators,” ASAP '14, pp.248-255, 2014. 10.1109/asap.2014.6868669 [18] K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R.Balasubramonian, and A. Davis, “Micro-pages: Increasing DRAM Efficiency with Locality-Aware Data Placement,” ASPLOS '10, pp.219-230, 2010. [10] M.K. Jeong, D.H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, “Balancing DRAM locality and parallelism in shared memory CMP systems,” HPCA '12, pp.1-12, 2012. 10.1109/hpca.2012.6168944 [19] D. Kang, H. Park, and J. Choi, “Effect of Page Frame Allocation Pattern on Bank Conflicts in Multi-core Systems,” RACS '13, pp.467-472, 2013. 10.1145/2513228.2513306 [20] P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard, “DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks,” USENIX Security '16, pp.565-581, 2016. 22 23 24 25 26 27 28 29 30 31 10 32 11 33 12 34 13 35 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 20 21 |
References_xml | – ident: 3 doi: 10.1145/360128.360134 – ident: 4 doi: 10.1145/2155620.2155624 – ident: 14 doi: 10.1109/ASAP.2014.6868669 – ident: 7 doi: 10.1109/MICRO.2010.51 – ident: 12 doi: 10.1145/2451116.2451137 – ident: 18 doi: 10.1145/1736020.1736045 – ident: 22 doi: 10.1145/2024724.2024954 – ident: 5 doi: 10.1145/339647.339668 – ident: 16 doi: 10.1109/HPGDMP.2016.010 – ident: 9 doi: 10.1145/2370816.2370869 – ident: 33 – ident: 31 – ident: 2 doi: 10.1109/ISLPED.2011.5993649 – ident: 28 – ident: 21 doi: 10.1145/2989081.2989131 – ident: 24 doi: 10.1145/2145816.2145840 – ident: 20 – ident: 13 doi: 10.1109/MICRO.2016.7783760 – ident: 17 – ident: 19 doi: 10.1145/2513228.2513306 – ident: 23 doi: 10.1109/L-CA.2011.4 – ident: 11 doi: 10.1109/HPCA.2014.6835945 – ident: 10 doi: 10.1109/HPCA.2012.6168944 – ident: 1 – ident: 6 doi: 10.1109/MICRO.2007.4408252 – ident: 15 – ident: 32 – ident: 29 – ident: 30 – ident: 35 doi: 10.1109/ICPPW.2010.38 – ident: 8 doi: 10.1145/2155620.2155664 – ident: 34 doi: 10.1145/1840845.1840883 – ident: 27 – ident: 25 – ident: 26 doi: 10.1145/2915516.2915522 |
SSID | ssj0018215 |
Score | 2.1999733 |
Snippet | The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of... |
SourceID | proquest crossref jstage |
SourceType | Aggregation Database Publisher |
StartPage | 2247 |
SubjectTerms | address mapping schemes Benchmarks DRAM Energy consumption energy efficiency Energy management Power consumption Power management |
Title | Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs |
URI | https://www.jstage.jst.go.jp/article/transinf/E101.D/9/E101.D_2017EDP7296/_article/-char/en https://www.proquest.com/docview/2303806782 |
Volume | E101.D |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
ispartofPNX | IEICE Transactions on Information and Systems, 2018/09/01, Vol.E101.D(9), pp.2247-2257 |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9NAEF6FwoEeeBQqAgXtgauLvX7uMWpcNURJoDRSD0jW2l63KVVc1faFX8VPZPblbGmEKBfHXq1Xzs7nb2fGM7MIfaS0LANYF53YDcBAAZ3XyUmeOxx067BkfhD5Ijl5No9OlsHn8_B8MPhlRS11bX5Y_NyaV_I_UoU2kKvIkn2AZPtBoQHOQb5wBAnD8Z9knOpS3WDtpzKHz0llRQiZTgla4Ph0NJP5A2t-rXx_15xJD8I3kJUo0ySCDFUOLsiUlVxmDoiArcbWWifp5CgVu0mYrcXlNwZdc7U1Ac2NVfxcgG02mi1PR8rt3NbN5apnGODyieT-blVcrm7r_pb5YimjM6f1Vd97MV9IULEfrG26i872U3hJH4jVOxy9yAHlQHEvV3QbB6Hj-aocu-HjFFjicGxhj9oES1SBTr1YAxvFWxeCULhSjuXEQLsI4YvT8RewJLbU3f5jPeyjFIV9BONkZpTMGuURekyA2gSnTr9uvlslRO2ZYf6rTtaEUT5te5Y7ytCTK7AHLu4rBVLTOXuBnmkTBY8U3l6iAV_voedm-w-sV4M9tGvVsnyFvm_AiO-BEdcVFmDEGozYBiPWYMQAJ3wHjNiA8TVaHqdnRyeO3rrDKUCFbh1aQbeE8txnYcEq7vnC3x4VXkTgIikKl1ESx_ATxZVblCzkTBQ2ioiX-6TM_X20s67X_A3CPvOIm9CorBgNgtJloIETmlcwi34B1tYQOWYOsxtVoSX7m-SGaKomuu-t399Nb4HAbJxRc2Ld3XcWSZHAQUN0YKSVaWZoMjDr_USogeTtAx_uHXq6eXcO0E572_H3oPW2-QcJtN-zra9r |
link.rule.ids | 315,783,787,27936,27937 |
linkProvider | Colorado Alliance of Research Libraries |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluating+Energy-Efficiency+of+DRAM+Channel+Interleaving+Schemes+for+Multithreaded+Programs&rft.jtitle=IEICE+transactions+on+information+and+systems&rft.au=IMAMURA%2C+Satoshi&rft.au=YASUI%2C+Yuichiro&rft.au=INOUE%2C+Koji&rft.au=ONO%2C+Takatsugu&rft.date=2018-09-01&rft.issn=0916-8532&rft.eissn=1745-1361&rft.volume=E101.D&rft.issue=9&rft.spage=2247&rft.epage=2257&rft_id=info:doi/10.1587%2Ftransinf.2017EDP7296&rft.externalDBID=n%2Fa&rft.externalDocID=10_1587_transinf_2017EDP7296 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0916-8532&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0916-8532&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0916-8532&client=summon |