DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory

To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads tha...

Full description

Saved in:
Bibliographic Details
Published in2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 1129 - 1143
Main Authors Panwar, Gagandeep, Laghari, Muhammad, Choukse, Esha, Jian, Xun
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads that already suffer from frequent virtual address translation misses in the TLB, adding an additional layer of address translation can double the translation misses (e.g., by adding a new miss in the MC per TLB miss). While TLB misses can be drastically reduced by using huge pages, no prior work has explored huge-page-like translation reach for hardware memory compression. While compressing and moving an entire huge page worth of data at a time can lead to huge-page-like address translation, moving a huge page worth of data together can consume an exorbitant amount of memory bandwidth.This paper explores how to achieve huge-page-like translation performance in this new address translation layer, while keeping compression at the page (instead of huge page) granularity. We propose dynamically shortening the translation entries of hot pages to only a few bits per entry by migrating hot pages to the limited number of DRAM locations whose addresses can be encoded using a few bits; colder pages still use the bigger fulllength translations so that colder pages can be placed anywhere in memory to fully utilize all the space in memory. Each short translation is tiny (e.g., 2 bits); as such, a 128KB translation cache filled mostly with short translations can achieve similar (e.g., 2GB) total translation reach as a TLB filled entirely with huge page entries. Evaluations show our idea - Dynamic Length Compressed-Memory Translations (DyLeCT) - improves average performance by 10.25% over the prior art.
AbstractList To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads that already suffer from frequent virtual address translation misses in the TLB, adding an additional layer of address translation can double the translation misses (e.g., by adding a new miss in the MC per TLB miss). While TLB misses can be drastically reduced by using huge pages, no prior work has explored huge-page-like translation reach for hardware memory compression. While compressing and moving an entire huge page worth of data at a time can lead to huge-page-like address translation, moving a huge page worth of data together can consume an exorbitant amount of memory bandwidth.This paper explores how to achieve huge-page-like translation performance in this new address translation layer, while keeping compression at the page (instead of huge page) granularity. We propose dynamically shortening the translation entries of hot pages to only a few bits per entry by migrating hot pages to the limited number of DRAM locations whose addresses can be encoded using a few bits; colder pages still use the bigger fulllength translations so that colder pages can be placed anywhere in memory to fully utilize all the space in memory. Each short translation is tiny (e.g., 2 bits); as such, a 128KB translation cache filled mostly with short translations can achieve similar (e.g., 2GB) total translation reach as a TLB filled entirely with huge page entries. Evaluations show our idea - Dynamic Length Compressed-Memory Translations (DyLeCT) - improves average performance by 10.25% over the prior art.
Author Jian, Xun
Laghari, Muhammad
Panwar, Gagandeep
Choukse, Esha
Author_xml – sequence: 1
  givenname: Gagandeep
  surname: Panwar
  fullname: Panwar, Gagandeep
  email: gpanwar@vt.edu
  organization: Virginia Tech
– sequence: 2
  givenname: Muhammad
  surname: Laghari
  fullname: Laghari, Muhammad
  email: mlaghari@vt.edu
  organization: Virginia Tech
– sequence: 3
  givenname: Esha
  surname: Choukse
  fullname: Choukse, Esha
  email: esha.choukse@microsoft.com
  organization: Microsoft Research
– sequence: 4
  givenname: Xun
  surname: Jian
  fullname: Jian, Xun
  email: xunj@vt.edu
  organization: Virginia Tech
BookMark eNqFiUsKwjAUACPowt8NXOQCrS-taRt3UpUKCoIFlxLqU4NtUl790Nvrwr2bmYEZsK51FhmbCPCFADXdHNKFVBDHfgDBzAeARHbYWMUqCSWEQSQT0WfHZbvFNJ_zRXEz-DL2yrPnFb1af1GaO_KctG1K_TDO8j3SxVGlbYH8GzzTdH5rQq9wVU3YNHjmO6wctSPWu-iywfHPQzZZr_I08wwinmoylab2JCACJeMo_LM_tO5BBw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA59077.2024.00085
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350326581
EndPage 1143
ExternalDocumentID 10609576
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-ieee_primary_106095763
IEDL.DBID RIE
IngestDate Wed Aug 07 05:31:01 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_106095763
ParticipantIDs ieee_primary_10609576
PublicationCentury 2000
PublicationDate 2024-June-29
PublicationDateYYYYMMDD 2024-06-29
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-29
  day: 29
PublicationDecade 2020
PublicationTitle 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev ISCA
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
Score 3.8414574
Snippet To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires...
SourceID ieee
SourceType Publisher
StartPage 1129
SubjectTerms address translation
Bandwidth
compression
DRAM
Hardware
hardware memory compression
memory
Memory management
memory subsystem
Random access memory
Title DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory
URI https://ieeexplore.ieee.org/document/10609576
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH64nTypWPHHlBy8prZdmzTeRnVUcWPgxN1Gf7zqmHRjrMj8681LNyei4C3kkDwSXt7Ly_d9AbiknLftKH36BUpxXwqHqwy1X6FfoEx0ElFQaaDXF_GTfz8KRmuyuuHCIKIBn6FNTfOWn8-yikpl2sNJHk2KBjRCx6vJWms6nOuoq7vHqBPoy57U1z6PRLEd-iH526cpJmZ096C_ma2Gikztapna2ccPIcZ_m7MP1paexwZfgecAdrA8hOeb1QNGw2vWyV4nSGUCFlcvyOnA4G-TKTITlmroGxts-QJMNxg94L8nC-QEMTd64jnrEQh3ZUGrezuMYk6Wjee1OMV4Y1T7CJrlrMRjYLknwkSkKsxc9EWRpNLNs0K6JD2nE4riBKxfhzj9o_8MdmmBCTDlqRY0l4sKz3VoXqYXZks-AbT4k9w
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEJ1oPehJjRg_qu7BKwgtH11vDdpQBdJEjL0RPgZtaqhpIKb-eneW1hqjibddDuwEsvtmZ997C3BJOW9X52L1szhXTcfWVZ6hmFdoFugkIokoqDQQhLb3aN6NrfFSrC61MIgoyWeoUVOe5eezrKZSmZjhZI_m2JuwJYBe541caymIE_2r4YPbt8R2zxEbvw7ZYut0R_K3a1Mkagx2IVyN15BFplpdpVr28cOK8d8B7YGyFuix0Rf07MMGlgfwdLPw0Y2uWT97mSAVCphXP6NKS4b6Opkik8DUkN_YaK0YYKLB6Aj_PZmjSiRz6Sies4BouAsF2oPbyPVUiix-a-wp4lVQ3UNolbMSj4DlHbuX2CnvZQaadpGkjpFnhWOQ-ZxIKYpjUH59xckfzy9g24sCP_aH4f0p7NDHJvpUh7ehVc1rPBNAXaXn8vd8Aq5ZlzI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+ACM%2FIEEE+51st+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=DyLeCT%3A+Achieving+Huge-page-like+Translation+Performance+for+Hardware-compressed+Memory&rft.au=Panwar%2C+Gagandeep&rft.au=Laghari%2C+Muhammad&rft.au=Choukse%2C+Esha&rft.au=Jian%2C+Xun&rft.date=2024-06-29&rft.pub=IEEE&rft.spage=1129&rft.epage=1143&rft_id=info:doi/10.1109%2FISCA59077.2024.00085&rft.externalDocID=10609576