A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain Acceleration
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-domain acceleration, control flow and memory accesses often degrade the processing elements (PEs) utilization and array efficiency by breaking the intact dataflow graph (DFG) into regions with mismatched pipelining ra...
Saved in:
Published in | IEEE transactions on computer-aided design of integrated circuits and systems Vol. 42; no. 3; pp. 874 - 886 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-domain acceleration, control flow and memory accesses often degrade the processing elements (PEs) utilization and array efficiency by breaking the intact dataflow graph (DFG) into regions with mismatched pipelining rate and access-execution stages. In this article, we propose a reschedulable dataflow and SIMD execution, which decouples the DFG with mismatched dataflow into multiple independent subgraphs. We map only one subgraph at a time but with fully unrolling, and reschedule different subgraphs serially in the runtime. Therefore, each subgraph works in its own way without interfering with others. At the same time, an individual subgraph can execute its dataflow in stream for utilization improvement, while unrolled instances composing as SIMD facilitate request coalescing for efficient memory access. With lightweight hardware modification, our design can be integrated in a general CGRA architecture. The experimental results show that our proposal improves the performance and energy efficiency over stream-dataflow CGRA in static-scheduling (Plasticine) by <inline-formula> <tex-math notation="LaTeX">1.6\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1.8\times </tex-math></inline-formula>, over which in dynamic scheduling (TIA) by <inline-formula> <tex-math notation="LaTeX">1.5\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.7\times </tex-math></inline-formula>, and outperforms Plasticine organized in vector-SIMD by <inline-formula> <tex-math notation="LaTeX">1.2\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2022.3185544 |