DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory

To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads tha...

Full description

Saved in:

Bibliographic Details
Published in	2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 1129 - 1143
Main Authors	Panwar, Gagandeep, Laghari, Muhammad, Choukse, Esha, Jian, Xun
Format	Conference Proceeding
Language	English
Published	IEEE 29.06.2024
Subjects	address translation Bandwidth compression DRAM Hardware hardware memory compression memory Memory management memory subsystem Random access memory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To expand effective memory capacity, hardware memory compression transparently compresses and packs memory values more densely together in DRAM. This requires introducing a new layer of hardware-managed address translation in the memory controller (MC). However, for large and irregular workloads that already suffer from frequent virtual address translation misses in the TLB, adding an additional layer of address translation can double the translation misses (e.g., by adding a new miss in the MC per TLB miss). While TLB misses can be drastically reduced by using huge pages, no prior work has explored huge-page-like translation reach for hardware memory compression. While compressing and moving an entire huge page worth of data at a time can lead to huge-page-like address translation, moving a huge page worth of data together can consume an exorbitant amount of memory bandwidth.This paper explores how to achieve huge-page-like translation performance in this new address translation layer, while keeping compression at the page (instead of huge page) granularity. We propose dynamically shortening the translation entries of hot pages to only a few bits per entry by migrating hot pages to the limited number of DRAM locations whose addresses can be encoded using a few bits; colder pages still use the bigger fulllength translations so that colder pages can be placed anywhere in memory to fully utilize all the space in memory. Each short translation is tiny (e.g., 2 bits); as such, a 128KB translation cache filled mostly with short translations can achieve similar (e.g., 2GB) total translation reach as a TLB filled entirely with huge page entries. Evaluations show our idea - Dynamic Length Compressed-Memory Translations (DyLeCT) - improves average performance by 10.25% over the prior art.
DOI:	10.1109/ISCA59077.2024.00085