GPU Accelerating for Rapid Multi-core Cache Simulation
To find the best memory system for emerging workloads, traces are obtained during application's execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache performance is a time consuming process. Comp...
Saved in:
Published in | 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum pp. 1387 - 1396 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2011
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To find the best memory system for emerging workloads, traces are obtained during application's execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache performance is a time consuming process. Compute unified device architecture (CUDA) is a software development platform which enables programmers to accelerate the general-purpose applications on the graphics processing unit (GPU). This paper presents a real time multi-core cache simulator, which was built based on the Pin tool to get the memory reference, and fast method for multi-core cache simulation using the CUDA-enabled GPU. The proposed method is accelerated by the following techniques: execution parallelism exploration, memory latency hiding, a novel trace compression methodology. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the hybrid parallel method of time-partitioning combines with set-partitioning presented here is 11.10× speedup compared to the CPU serial simulation algorithm. The present simulator can characterize cache performance of single-threaded or multi-threaded workloads at the speeds of 6-15 MIPS. It can simulates 6 cache configurations within one single pass at this speeds compared to CMPim, which can only simulate one cache configuration each simulation pass at the speeds of 4-10 MIPS. |
---|---|
ISBN: | 9781612844251 1612844251 |
ISSN: | 1530-2075 |
DOI: | 10.1109/IPDPS.2011.295 |