On the Design of Time-Constrained and Buffer-Optimal Self-Timed Pipelines

Pipelining is a powerful technique to achieve high performance in computing systems. However, as computing platforms become large-scale and integrate with heterogeneous processing elements (PEs) (CPUs, GPUs, field-programmable gate arrays, etc.), it is difficult to employ a global clock to achieve s...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 38; no. 8; pp. 1515 - 1528
Main Authors Jiang, Weiwen, Sha, Edwin Hsing-Mean, Zhuge, Qingfeng, Yang, Lei, Chen, Xianzhang, Hu, Jingtong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Pipelining is a powerful technique to achieve high performance in computing systems. However, as computing platforms become large-scale and integrate with heterogeneous processing elements (PEs) (CPUs, GPUs, field-programmable gate arrays, etc.), it is difficult to employ a global clock to achieve synchronous pipelines. Therefore, self-timed (or asynchronous) pipelines are usually adopted. Nevertheless, due to their complex running behavior, the performance modeling and systematic optimizations for self-timed pipeline (STP) systems are more complicated than those for synchronous ones. This paper employs marked graph theory to model STPs and presents algorithms to detect performance bottlenecks. Based on the proposed model, we observe that the system performance can be improved by inserting buffers. Due to the limited memory resources on the PEs, it is critical to minimize the number of buffers for STPs while satisfying the required timing constraints. In this paper, we propose integer linear programming formulations to obtain the optimal solutions and devise efficient algorithms to obtain the near-optimal solutions. Experimental results show that the proposed algorithms can achieve 53.10% improvement in the maximum performance and 54.04% reduction in the number of buffers, compared with the technique for the slack matching problem.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2018.2846642