DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory

To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads tha...

Full description

Saved in:

Bibliographic Details
Published in	2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 1129 - 1143
Main Authors	Panwar, Gagandeep, Laghari, Muhammad, Choukse, Esha, Jian, Xun
Format	Conference Proceeding
Language	English
Published	IEEE 29.06.2024
Subjects	address translation Bandwidth compression DRAM Hardware hardware memory compression memory Memory management memory subsystem Random access memory
Online Access	Get full text

Cover

Loading…

Abstract	To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads that already suffer from frequent virtual address translation misses in the TLB, adding an additional layer of address translation can double the translation misses (e.g., by adding a new miss in the MC per TLB miss). While TLB misses can be drastically reduced by using huge pages, no prior work has explored huge-page-like translation reach for hardware memory compression. While compressing and moving an entire huge page worth of data at a time can lead to huge-page-like address translation, moving a huge page worth of data together can consume an exorbitant amount of memory bandwidth.This paper explores how to achieve huge-page-like translation performance in this new address translation layer, while keeping compression at the page (instead of huge page) granularity. We propose dynamically shortening the translation entries of hot pages to only a few bits per entry by migrating hot pages to the limited number of DRAM locations whose addresses can be encoded using a few bits; colder pages still use the bigger fulllength translations so that colder pages can be placed anywhere in memory to fully utilize all the space in memory. Each short translation is tiny (e.g., 2 bits); as such, a 128KB translation cache filled mostly with short translations can achieve similar (e.g., 2GB) total translation reach as a TLB filled entirely with huge page entries. Evaluations show our idea - Dynamic Length Compressed-Memory Translations (DyLeCT) - improves average performance by 10.25% over the prior art.
AbstractList	To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads that already suffer from frequent virtual address translation misses in the TLB, adding an additional layer of address translation can double the translation misses (e.g., by adding a new miss in the MC per TLB miss). While TLB misses can be drastically reduced by using huge pages, no prior work has explored huge-page-like translation reach for hardware memory compression. While compressing and moving an entire huge page worth of data at a time can lead to huge-page-like address translation, moving a huge page worth of data together can consume an exorbitant amount of memory bandwidth.This paper explores how to achieve huge-page-like translation performance in this new address translation layer, while keeping compression at the page (instead of huge page) granularity. We propose dynamically shortening the translation entries of hot pages to only a few bits per entry by migrating hot pages to the limited number of DRAM locations whose addresses can be encoded using a few bits; colder pages still use the bigger fulllength translations so that colder pages can be placed anywhere in memory to fully utilize all the space in memory. Each short translation is tiny (e.g., 2 bits); as such, a 128KB translation cache filled mostly with short translations can achieve similar (e.g., 2GB) total translation reach as a TLB filled entirely with huge page entries. Evaluations show our idea - Dynamic Length Compressed-Memory Translations (DyLeCT) - improves average performance by 10.25% over the prior art.
Author	Jian, Xun Laghari, Muhammad Panwar, Gagandeep Choukse, Esha
Author_xml	– sequence: 1 givenname: Gagandeep surname: Panwar fullname: Panwar, Gagandeep email: gpanwar@vt.edu organization: Virginia Tech – sequence: 2 givenname: Muhammad surname: Laghari fullname: Laghari, Muhammad email: mlaghari@vt.edu organization: Virginia Tech – sequence: 3 givenname: Esha surname: Choukse fullname: Choukse, Esha email: esha.choukse@microsoft.com organization: Microsoft Research – sequence: 4 givenname: Xun surname: Jian fullname: Jian, Xun email: xunj@vt.edu organization: Virginia Tech
BookMark	eNqFiUsKwjAUACPowt8NXOQCrS-taRt3UpUKCoIFlxLqU4NtUl790Nvrwr2bmYEZsK51FhmbCPCFADXdHNKFVBDHfgDBzAeARHbYWMUqCSWEQSQT0WfHZbvFNJ_zRXEz-DL2yrPnFb1af1GaO_KctG1K_TDO8j3SxVGlbYH8GzzTdH5rQq9wVU3YNHjmO6wctSPWu-iywfHPQzZZr_I08wwinmoylab2JCACJeMo_LM_tO5BBw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ISCA59077.2024.00085
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350326581
EndPage	1143
ExternalDocumentID	10609576
Genre	orig-research
GrantInformation_xml	– fundername: National Science Foundation funderid: 10.13039/100000001
GroupedDBID	6IE 6IH CBEJK RIE RIO
ID	FETCH-ieee_primary_106095763
IEDL.DBID	RIE
IngestDate	Wed Aug 07 05:31:01 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-ieee_primary_106095763
ParticipantIDs	ieee_primary_10609576
PublicationCentury	2000
PublicationDate	2024-June-29
PublicationDateYYYYMMDD	2024-06-29
PublicationDate_xml	– month: 06 year: 2024 text: 2024-June-29 day: 29
PublicationDecade	2020
PublicationTitle	2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev	ISCA
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	3.8414574
Snippet	To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires...
SourceID	ieee
SourceType	Publisher
StartPage	1129
SubjectTerms	address translation Bandwidth compression DRAM Hardware hardware memory compression memory Memory management memory subsystem Random access memory
Title	DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory
URI	https://ieeexplore.ieee.org/document/10609576
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH64nTypWPHHlBy8prZdmzTeRnVUcWPgxN1Gf7zqmHRjrMj8681LNyei4C3kkDwSXt7Ly_d9AbiknLftKH36BUpxXwqHqwy1X6FfoEx0ElFQaaDXF_GTfz8KRmuyuuHCIKIBn6FNTfOWn8-yikpl2sNJHk2KBjRCx6vJWms6nOuoq7vHqBPoy57U1z6PRLEd-iH526cpJmZ096C_ma2Gikztapna2ccPIcZ_m7MP1paexwZfgecAdrA8hOeb1QNGw2vWyV4nSGUCFlcvyOnA4G-TKTITlmroGxts-QJMNxg94L8nC-QEMTd64jnrEQh3ZUGrezuMYk6Wjee1OMV4Y1T7CJrlrMRjYLknwkSkKsxc9EWRpNLNs0K6JD2nE4riBKxfhzj9o_8MdmmBCTDlqRY0l4sKz3VoXqYXZks-AbT4k9w
link.rule.ids	310,311,786,790,795,796,802,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEJ1oPehJjRg_qu7BKwgtH11vDdpQBdJEjL0RPgZtaqhpIKb-eneW1hqjibddDuwEsvtmZ997C3BJOW9X52L1szhXTcfWVZ6hmFdoFugkIokoqDQQhLb3aN6NrfFSrC61MIgoyWeoUVOe5eezrKZSmZjhZI_m2JuwJYBe541caymIE_2r4YPbt8R2zxEbvw7ZYut0R_K3a1Mkagx2IVyN15BFplpdpVr28cOK8d8B7YGyFuix0Rf07MMGlgfwdLPw0Y2uWT97mSAVCphXP6NKS4b6Opkik8DUkN_YaK0YYKLB6Aj_PZmjSiRz6Sies4BouAsF2oPbyPVUiix-a-wp4lVQ3UNolbMSj4DlHbuX2CnvZQaadpGkjpFnhWOQ-ZxIKYpjUH59xckfzy9g24sCP_aH4f0p7NDHJvpUh7ehVc1rPBNAXaXn8vd8Aq5ZlzI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+ACM%2FIEEE+51st+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=DyLeCT%3A+Achieving+Huge-page-like+Translation+Performance+for+Hardware-compressed+Memory&rft.au=Panwar%2C+Gagandeep&rft.au=Laghari%2C+Muhammad&rft.au=Choukse%2C+Esha&rft.au=Jian%2C+Xun&rft.date=2024-06-29&rft.pub=IEEE&rft.spage=1129&rft.epage=1143&rft_id=info:doi/10.1109%2FISCA59077.2024.00085&rft.externalDocID=10609576