Dedicated Instruction Set for Pattern-based Data Transfers: an Experimental Validation on Systems Containing In-Memory Computing Units

In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architectu...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 42; no. 11; p. 1
Main Authors	Mambu, Kevin, Charles, Henri-Pierre, Kooli, Maha
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Central Processing Unit Computational modeling Computer architecture Convolution Data transfer Data transmission Embedded systems In-Memory Computing Instruction Set Architecture Instruction sets Instruction sets (computers) Nodes Non-Von Neumann Optimization Pattern Performance Analysis Programming Programming Model Random access memory RISC Static random access memory Stencil
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architecture (ISA) and a novel pattern encoding for IMC to transfer and organize data streams in order to perform efficiently computation. This instruction set is implemented on the Data-locality Management Unit (DMU) as a subset of the Computational SRAM (C-SRAM) Instruction Set Architecture. A programming model to interact with the DMU at languagelevel is also presented in this paper. This DMU ISA is evaluated on six applications run on three different system nodes. These system nodes are based on existing RISC-V cores and range from embedded to high-performance computing domain. Experiments show on average a speed-up of W8.81, an energy reduction factor of W6.81 and an improvement of the number of operations per cycle of W4.59, for The C-SRAM architecture integrating the proposed ISA of the DMU compared to a reference implementation on embedded systems. Results also show an improvement of the number of operations per cycle of W2.99 compared to a reference implementation on all system nodes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2023.3258346