Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have minimal dependences between them. Recent work has...

Full description

Saved in:
Bibliographic Details
Published in33rd International Symposium on Computer Architecture (ISCA'06) pp. 216 - 226
Main Authors Colohan, Christopher B., Ailamaki, Anastassia, Steffan, J. Gregory, Mowry, Todd C.
Format Conference Proceeding
LanguageEnglish
Published Washington, DC, USA IEEE Computer Society 01.05.2006
IEEE
SeriesACM Conferences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have minimal dependences between them. Recent work has shown that TLS can offer compelling performance improvements for database workloads, but only when targeting much larger speculative threads of more than 50,000 dynamic instructions per thread, with many frequent data dependences between them. To support such large and dependent speculative threads, hardware must be able to buffer the additional speculative state, and must also address the more challenging problem of tolerating the resulting cross-thread data dependences In this paper we present hardware support for large speculative threads that integrates several previous proposals for TLS hardware. We also introduce support for subthreads: a mechanism for tolerating cross-thread data dependences by checkpointing speculative execution. When speculation fails due to a violated data dependence, with sub-threads the failed thread need only rewind to the checkpoint of the appropriate sub-thread rather than rewinding to the start of execution; this significantly reduces the cost of mis-speculation. We evaluate our hardware support for large and dependent speculative threads in the database domain and find that the transaction response time for three of the five transactions from TPC-C (on a simulated 4- processor chip-multiprocessor) speedup by a factor of 1.9 to 2.9.
ISBN:9780769526089
076952608X
ISSN:1063-6897
2575-713X
DOI:10.1109/ISCA.2006.43