IMR: High-Performance Low-Cost Multi-Ring NoCs
A ring topology is a common solution of network-on-chip (NoC) in industry, but is frequently criticized to have poor scalability. In this paper, we present a novel type of multi-ring NoC called isolated multi-ring (IMR), which can even support chip multiprocessors (CMPs) with 1,024 cores. In IMR, an...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 27; no. 6; pp. 1700 - 1712 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.06.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A ring topology is a common solution of network-on-chip (NoC) in industry, but is frequently criticized to have poor scalability. In this paper, we present a novel type of multi-ring NoC called isolated multi-ring (IMR), which can even support chip multiprocessors (CMPs) with 1,024 cores. In IMR, any pair of cores are connected via at least one isolated ring, so that each packet can reach the destination without transferring from one ring to another. Therefore, IMR no longer needs expensive routers as mesh, which not only enhances the network performance but also reduces hardware overheads. We utilize simulated evolution to design optimized IMR topologies. We compare these IMR topologies against nine representative NoCs (e.g., traditional mesh, multi mesh, low-cost mesh, Express-virtual-channels mesh (EVC), torus ring, and hierarchical ring). We observe from experiments that IMR significantly outperforms its competitors in both saturation throughput and latency across all scenarios considered. For example, in a 16 × 16 CMP, IMR improves the saturation throughput of a state-of-the-art mesh (EVC) by 265.29 percent on average, and reduces the average packet latency on SPLASH-2 application traces by 71.58 percent, while consuming 5.08 percent less area and 9.76 percent less power. In a 32 × 32 CMP, IMR averagely improves the saturation throughput of EVC by 191.58 percent, and averagely reduces the packet latency on SPLASH-2 application traces by 23.09 percent, while consuming 2.86 percent less area and 10.81 percent less power. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2015.2465905 |