Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

Non-volatile memory technology is now available in commodity hardware. This technology can be used as a backup memory for an external dram cache memory without needing to modify the software. However, the higher read and write latencies of non-volatile memory may exacerbate the memory wall problem....

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 16; no. 9; p. e0257047
Main Authors	Lamela, Adrián, Ossorio, Óscar G, Vinuesa, Guillermo, Sahelices, Benjamín
Format	Journal Article
Language	English
Published	San Francisco Public Library of Science 14.09.2021 Public Library of Science (PLoS)
Subjects	Algorithms Analysis Bandwidths Chips (memory devices) Clustering Complexity Computer and Information Sciences Computer applications Computer architecture Computer memory Computer science Dynamic cell Dynamic random access memory Efficiency Engineering Engineering and Technology Informatics Latency Markov chains Markov processes Microprocessors Normal distribution Physical sciences Research and Analysis Methods Simulation Technology Spain
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Non-volatile memory technology is now available in commodity hardware. This technology can be used as a backup memory for an external dram cache memory without needing to modify the software. However, the higher read and write latencies of non-volatile memory may exacerbate the memory wall problem. In this work we present a novel off-chip prefetch technique based on a Hidden Markov Model that specifically deals with the latency problem caused by complexity of off-chip memory access patterns. Firstly, we present a thorough analysis of off-chip memory access patterns to identify its complexity in multicore processors. Based on this study, we propose a prefetching module located in the llc which uses two small tables, and where the computational complexity of which is linear with the number of computing threads. Our Markov-based technique is able to keep track and make clustering of several simultaneous groups of memory accesses coming from multiple simultaneous threads in a multicore processor. It can quickly identify complex address groups and trigger prefetch with very high accuracy. Our simulations show an improvement of up to 76% in the hit ratio of an off-chip dram cache for multicore architecture over the conventional prefetch technique (g/dc). Also, the overhead of prefetch requests (failed prefetches) is reduced by 48% in single core simulations and by 83% in multicore simulations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0257047