Dedicated Instruction Set for Pattern-based Data Transfers: an Experimental Validation on Systems Containing In-Memory Computing Units
In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architectu...
Saved in:
Published in | IEEE transactions on computer-aided design of integrated circuits and systems Vol. 42; no. 11; p. 1 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architecture (ISA) and a novel pattern encoding for IMC to transfer and organize data streams in order to perform efficiently computation. This instruction set is implemented on the Data-locality Management Unit (DMU) as a subset of the Computational SRAM (C-SRAM) Instruction Set Architecture. A programming model to interact with the DMU at languagelevel is also presented in this paper. This DMU ISA is evaluated on six applications run on three different system nodes. These system nodes are based on existing RISC-V cores and range from embedded to high-performance computing domain. Experiments show on average a speed-up of W8.81, an energy reduction factor of W6.81 and an improvement of the number of operations per cycle of W4.59, for The C-SRAM architecture integrating the proposed ISA of the DMU compared to a reference implementation on embedded systems. Results also show an improvement of the number of operations per cycle of W2.99 compared to a reference implementation on all system nodes. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2023.3258346 |