A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain Acceleration

When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-domain acceleration, control flow and memory accesses often degrade the processing elements (PEs) utilization and array efficiency by breaking the intact dataflow graph (DFG) into regions with mismatched pipelining ra...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 42; no. 3; pp. 874 - 886
Main Authors Yin, Chen, Jing, Naifeng, Jiang, Jianfei, Wang, Qin, Mao, Zhigang
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-domain acceleration, control flow and memory accesses often degrade the processing elements (PEs) utilization and array efficiency by breaking the intact dataflow graph (DFG) into regions with mismatched pipelining rate and access-execution stages. In this article, we propose a reschedulable dataflow and SIMD execution, which decouples the DFG with mismatched dataflow into multiple independent subgraphs. We map only one subgraph at a time but with fully unrolling, and reschedule different subgraphs serially in the runtime. Therefore, each subgraph works in its own way without interfering with others. At the same time, an individual subgraph can execute its dataflow in stream for utilization improvement, while unrolled instances composing as SIMD facilitate request coalescing for efficient memory access. With lightweight hardware modification, our design can be integrated in a general CGRA architecture. The experimental results show that our proposal improves the performance and energy efficiency over stream-dataflow CGRA in static-scheduling (Plasticine) by <inline-formula> <tex-math notation="LaTeX">1.6\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1.8\times </tex-math></inline-formula>, over which in dynamic scheduling (TIA) by <inline-formula> <tex-math notation="LaTeX">1.5\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.7\times </tex-math></inline-formula>, and outperforms Plasticine organized in vector-SIMD by <inline-formula> <tex-math notation="LaTeX">1.2\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2022.3185544