Approximate Memory Compression

Memory subsystems are a major energy bottleneck in computing platforms due to frequent transfers between processors and off-chip memory. We propose approximate memory compression, a technique that leverages the intrinsic resilience of emerging workloads such as machine learning and data analytics to...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 28; no. 4; pp. 980 - 991
Main Authors	Ranjan, Ashish, Raha, Arnab, Raghunathan, Vijay, Raghunathan, Anand
Format	Journal Article
Language	English
Published	New York IEEE 01.04.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Approximate memories Computation Controllers Data structures DRAM Field programmable gate arrays Hardware Machine learning main memory memory compression Micromechanical devices Microprocessors Program processors Quality control Random access memory Runtime spin-transfer torque magnetic RAM (STT-MRAM) Traffic congestion
Online Access	Get full text
ISSN	1063-8210 1557-9999
DOI	10.1109/TVLSI.2020.2970041

Cover

Loading…

More Information
Summary:	Memory subsystems are a major energy bottleneck in computing platforms due to frequent transfers between processors and off-chip memory. We propose approximate memory compression, a technique that leverages the intrinsic resilience of emerging workloads such as machine learning and data analytics to reduce off-chip memory traffic, thereby improving energy and performance. We realize approximate memory compression by enhancing the memory controller to be aware of approximate memory regions-regions in memory that contain approximation-resilient data-and to transparently compress (decompress) the data written to (read from) these regions. To provide control over approximations, each approximate memory region is associated with an error constraint such as the maximum error that may be introduced in each data element. The quality-aware memory controller subjects memory transactions to a compression scheme that introduces approximations, thereby reducing memory traffic, while adhering to the specified error constraint for each approximate memory region. A software interface is provided to allow programmers to identify data structures (DSs) that are resilient to approximations. A runtime quality control framework automatically determines the error constraints for the identified DSs such that a given target application-level quality is maintained. We evaluate our proposal by applying it to three different main memory technologies in the context of a general-purpose computing system-DDR3 DRAM, LPDDR3 DRAM, and spin-transfer torque magnetic RAM (STT-MRAM). To demonstrate the feasibility of the proposed concepts, we also implement a hardware prototype using the Intel UniPHY-DDR3 memory controller and Nios-II processor, a Hynix DDR3 DRAM module, and a Stratix-IV field-programmable gate array (FPGA) development board. Across a wide range of machine learning benchmarks, approximate memory compression obtains significant benefits in main memory energy (<inline-formula> <tex-math notation="LaTeX">1.18\times </tex-math></inline-formula> for DDR3 DRAM, <inline-formula> <tex-math notation="LaTeX">1.52\times </tex-math></inline-formula> for LPDDR3 DRAM, and <inline-formula> <tex-math notation="LaTeX">2.0\times </tex-math></inline-formula> for STT-MRAM) and a simultaneous improvement in execution time (5.2% for DDR3 DRAM, 5.4% for LPDDR3 DRAM, and 9.3% for STT-MRAM) with nearly identical application output quality.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2020.2970041