Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable Processors

Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 14; no. 9; pp. 986 - 997
Main Authors	Cong, J., Guoling Han, Zhiru Zhang
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.09.2006 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Application specific processors Applied sciences Architecture Bandwidth Communication system control Computers, microcomputers Configurable Data analysis Data bandwidth optimization Design. Technologies. Operation analysis. Testing Electronics Exact sciences and technology Hardware Input-output equipment Integrated circuits Integrated circuits by function (including memories and processors) Microarchitecture Optimization Optimizing compilers Processors Reconfigurable architectures Reconfigurable logic register allocation Registers Semiconductor electronics. Microelectronics. Optoelectronics. Solid state devices Shadows Very large scale integration Performance evaluation Compiler Data analysis Processor Instruction sets Data transmission optimizing compilers Algorithm register allocation Boarded computer Data bandwidth optimization Coding Integrated circuit Compiler optimization Reconfigurable architectures Custom circuit Cost lowering Quantitative analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck. In this paper, we first present a quantitative analysis of the data bandwidth limitation in configurable processors, and then propose a novel low-cost architectural extension and associated compilation techniques to address the problem. Specifically, we embed a single control bit in the instruction op-codes to selectively copy the execution results to a set of hash-mapped shadow registers in the write-back stage. This can efficiently reduce the communication overhead due to data transfers between the core processor and the custom logic. We also present a novel simultaneous global shadow register binding with a hash function generation algorithm to take full advantage of the extension. The application of our approach leads to a nearly optimal performance speedup
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2006.884050