Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of a...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 27; no. 4; pp. 769 - 780
Main Author 张栚滈 王箫音 佟冬 易江芳 陆俊林 王克义
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.07.2012
Springer Nature B.V
Microprocessor Research and Development Center, Peking University, Beijing 100871, China%Engineering Research Center of Microprocessor and System, Ministry of Education, Beijing 100871, China %School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China%Peking University% Electronics Engineering and Computer Science,Peking University
Subjects
Online AccessGet full text
ISSN1000-9000
1860-4749
DOI10.1007/s11390-012-1263-7

Cover

More Information
Summary:Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively.
Bibliography:store-load forwarding, load/store queue, value-based load re-execution
Zhen-Hao Zhang , Xiao-Yin Wang, Dong Tong ,Jiang-Fang Yi ,Jun-Lin Lu, Ke-Yi Wang (Microprocessor Research and Development Center, Peking University, Beijing 100871, China Engineering Research Center of Microprocessor and System, Ministry of Education, Beijing 100871, China School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China)
11-2296/TP
Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively.
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-012-1263-7