Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency
Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of a...
Saved in:
Published in | Journal of computer science and technology Vol. 27; no. 4; pp. 769 - 780 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Boston
Springer US
01.07.2012
Springer Nature B.V Microprocessor Research and Development Center, Peking University, Beijing 100871, China%Engineering Research Center of Microprocessor and System, Ministry of Education, Beijing 100871, China %School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China%Peking University% Electronics Engineering and Computer Science,Peking University |
Subjects | |
Online Access | Get full text |
ISSN | 1000-9000 1860-4749 |
DOI | 10.1007/s11390-012-1263-7 |
Cover
Summary: | Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively. |
---|---|
Bibliography: | store-load forwarding, load/store queue, value-based load re-execution Zhen-Hao Zhang , Xiao-Yin Wang, Dong Tong ,Jiang-Fang Yi ,Jun-Lin Lu, Ke-Yi Wang (Microprocessor Research and Development Center, Peking University, Beijing 100871, China Engineering Research Center of Microprocessor and System, Ministry of Education, Beijing 100871, China School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China) 11-2296/TP Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively. ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
ISSN: | 1000-9000 1860-4749 |
DOI: | 10.1007/s11390-012-1263-7 |