Dynamic-vector execution on a general purpose EDGE chip multiprocessor

This paper proposes a cost-effective technique that morphs the available cores of a low power chip multiprocessor (CMP) into an accelerator for data parallel (DLP) workloads. Instead of adding a special-purpose vector architecture as an accelerator, our technique leverages the resources of each CMP...

Full description

Saved in:

Bibliographic Details
Published in	2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV) pp. 18 - 25
Main Authors	Duric, Milovan, Palomar, Oscar, Smith, Aaron, Stanic, Milan, Unsal, Osman, Cristal, Adrian, Valero, Mateo, Burger, Doug, Veidenbaum, Alex
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2014
Subjects	Computational modeling Computer architecture Hardware Instruction sets Message systems Registers Vectors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper proposes a cost-effective technique that morphs the available cores of a low power chip multiprocessor (CMP) into an accelerator for data parallel (DLP) workloads. Instead of adding a special-purpose vector architecture as an accelerator, our technique leverages the resources of each CMP core to mimic the functionality of a vector processor. The morphing provides dynamic vector execution (DVX) on a general purpose CMP, by adding minimal hardware for vector control. DVX enhances the vector execution by dynamically configuring the allocation of compute and memory resources to match particular workload requirements. As an energy efficient substrate, we utilize modest dual issue cores based on an Explicit Data Graph Execution (EDGE) architecture. The results show that a DVX enabled 4-core EDGE CMP improves the energy-delay product over 14x, at the cost of only 1.1% of additional area. We compare DVX against a CMP that adds a dedicated DLP accelerator based on a conventional high performance vector design. The vector accelerator increases the area footprint over 74%, which greatly affects the cost of the modest processor. DVX avoids the additional costs and yet gains over 86% of the speedup obtained with the dedicated accelerator.
DOI:	10.1109/SAMOS.2014.6893190