Bypass and insertion algorithms for exclusive last-level caches

Inclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared to exclusive LLCs. However, exclusive LLCs mak...

Full description

Saved in:

Bibliographic Details
Published in	2011 38th Annual International Symposium on Computer Architecture (ISCA) pp. 81 - 92
Main Authors	Gaur, Jayesh, Chaudhuri, Mainak, Subramoney, Sreenivas
Format	Conference Proceeding
Language	English
Published	New York, NY, USA ACM 04.06.2011 IEEE
Series	ACM Conferences
Subjects	Algorithm design and analysis bypass policy Coherence Exclusive last-level cache Hardware Hardware > Integrated circuits > Semiconductor memory > Dynamic memory insertion policy Prefetching Proposals Resource management Servers bypass policy exclusive last-level cache insertion policy
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Inclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared to exclusive LLCs. However, exclusive LLCs make the design of replacement policies more challenging. While in an inclusive LLC a block can gather a filtered access history, this is not possible in an exclusive design because the block is de-allocated from the LLC on a hit. As a result, the popular least-recently-used replacement policy and its approximations are rendered ineffective and proper choice of insertion ages of cache blocks becomes even more important in exclusive designs. On the other hand, it is not necessary to fill every block into an exclusive LLC. This is known as selective cache bypassing and is not possible to implement in an inclusive LLC because that would violate inclusion. This paper explores insertion and bypass algorithms for exclusive LLCs. Our detailed execution-driven simulation results show that a combination of our best insertion and bypass policies delivers an improvement of up to 61.2% and on average (geometric mean) 3.4% in terms of instructions retired per cycle (IPC) for 97 single-threaded dynamic instruction traces spanning selected SPEC 2006 and server applications, running on a 2 MB 16-way exclusive LLC compared to a baseline exclusive design in the presence of well-tuned multi-stream hardware prefetchers. The corresponding improvements in throughput for 35 4-way multi-programmed workloads running with an 8 MB 16-way shared exclusive LLC are 20.6% (maximum) and 2.5% (geometric mean).
ISBN:	9781450304726 1450304729
ISSN:	1063-6897 2575-713X
DOI:	10.1145/2000064.2000075