Tuning compiler optimizations for simultaneous multithreading

Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor c...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of 30th Annual International Symposium on Microarchitecture pp. 114 - 124
Main Authors	Lo, J.L., Eggers, S.J., Levy, H.M., Parekh, S.S., Tullsen, D.M.
Format	Conference Proceeding
Language	English
Published	IEEE 1997
Subjects	Computer architecture Context Delay Multithreading Optimizing compilers Processor scheduling Program processors Software algorithms Surface-mount technology Yarn
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor communication. This paper reexamines several compiler optimizations in the context of simultaneous multithreading (SMT), a processor architecture that issues instructions from multiple threads to the functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current multiprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding latencies. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines compilers can generate code that improves the performance of programs executing on SMT machines.
ISBN:	0818679778 9780818679773
ISSN:	1072-4451
DOI:	10.1109/MICRO.1997.645803