A many-core architecture for in-memory data processing

For many years, the highest energy cost in processing has been data movement rather than computation, and energy is the limiting factor in processor design [21]. As the data needed for a single application grows to exabytes [56], there is clearly an opportunity to design a bandwidth-optimized archit...

Full description

Saved in:
Bibliographic Details
Published in2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) pp. 245 - 258
Main Authors Agrawal, Sandeep R, Idicula, Sam, Raghavan, Arun, Vlachos, Evangelos, Govindaraju, Venkatraman, Varadarajan, Venkatanathan, Balkesen, Cagri, Giannikis, Georgios, Roth, Charlie, Agarwal, Nipun, Sedlar, Eric
Format Conference Proceeding
LanguageEnglish
Published New York, NY, USA ACM 14.10.2017
SeriesACM Conferences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:For many years, the highest energy cost in processing has been data movement rather than computation, and energy is the limiting factor in processor design [21]. As the data needed for a single application grows to exabytes [56], there is clearly an opportunity to design a bandwidth-optimized architecture for big data computation by specializing hardware for data movement. We present the Data Processing Unit or DPU, a shared memory many-core that is specifically designed for high bandwidth analytics workloads. The DPU contains a unique Data Movement System (DMS), which provides hardware acceleration for data movement and partitioning operations at the memory controller that is sufficient to keep up with DDR bandwidth. The DPU also provides acceleration for core to core communication via a unique hardware RPC mechanism called the Atomic Transaction Engine. Comparison of a DPU chip fabricated in 40nm with a Xeon processor on a variety of data processing applications shows a 3× - 15× performance per watt advantage.
ISBN:1450349528
9781450349529
ISSN:2379-3155
DOI:10.1145/3123939.3123985