Bank on Compute-Near-Memory: Design Space Exploration of Processing-Near-Bank Architectures

Near-DRAM computing strategies advocate for providing computational capabilities close to where data is stored. Although this paradigm can effectively address the memory-to-processor communication bottleneck, it also presents new challenges: The strict resource constraints in the memory periphery de...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 43; no. 11; pp. 4117 - 4129
Main Authors	Medina, Rafael, Ansaloni, Giovanni, Zapater, Marina, Levisse, Alexandre, Alinezhad Chamazcoti, Saeideh, Evenblij, Timon, Biswas, Dwaipayan, Catthoor, Francky, Atienza, David
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerator Bandwidth compute-near-memory (CnM) Configuration management DRAM Dynamic random access memory Integrated circuits Kernel Memory management Microprocessors Pareto optimum performance evaluation Process control processing-in-memory Random access memory Resource management Space exploration system simulation Tuning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Near-DRAM computing strategies advocate for providing computational capabilities close to where data is stored. Although this paradigm can effectively address the memory-to-processor communication bottleneck, it also presents new challenges: The strict resource constraints in the memory periphery demand careful tailoring of architectural elements. We herein propose a novel framework and methodology to explore compute-near-memory designs that interface to DRAM memory banks, demonstrating the area, energy, and performance tradeoffs subject to the architectural configuration. We exemplify this methodology by conducting two studies on compute-near-bank designs: 1) analyzing the interaction between control and data resources, and 2) exploring the integration of processing units with different DRAM standards. According to our study, the optimal size ratios between instruction and data capacity vary from <inline-formula> <tex-math notation="LaTeX">2\times </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">4\times </tex-math></inline-formula> across benchmarks from representative application domains. The retrieved Pareto-optimal solutions from our framework improve state-of-the-art designs, e.g., achieving a 50% performance increase on matrix operations with 15% energy overhead relative to the FIMDRAM design. In addition, the exploration of DRAM shows the interplay between available internal bandwidth, performance, and area overhead. For example, a threefold increase in bandwidth rises performance by 47% across workloads at a 34% extra area cost.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2024.3442989