Parallel GEMM-based convolution for deep learning on multicore RISC-V processors

We address the efficient implementation of the convolution operator on the GAP8 parallel ultra-low power platform (PULP), a heterogeneous multi-core processor equipped with a fabric controller (FC); a cluster of eight compute cores; and a four-level memory hierarchy with scratchpads instead of conve...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of supercomputing Vol. 80; no. 9; pp. 12623 - 12643
Main Authors	Ramírez, Cristian, Castelló, Adrián, Martínez, Héctor, Quintana-Ortí, Enrique S.
Format	Journal Article
Language	English
Published	New York Springer US 01.06.2024 Springer Nature B.V
Subjects	Compilers Computer Science Convolution Deep learning Interpreters Microprocessors Processor Architectures Programming Languages RISC Tiling Deep learning Edge processors Convolutional layers Performance analysis
Online Access	Get full text
ISSN	0920-8542 1573-0484
DOI	10.1007/s11227-024-05927-y

Cover

More Information
Summary:	We address the efficient implementation of the convolution operator on the GAP8 parallel ultra-low power platform (PULP), a heterogeneous multi-core processor equipped with a fabric controller (FC); a cluster of eight compute cores; and a four-level memory hierarchy with scratchpads instead of conventional, hardware-assisted cache memories. Our solution for this platform transforms the convolution into a general matrix–matrix multiplication ( gemm ) via the lowering approach, demonstrating that it is possible to attain reasonable performance on the GAP8 by carefully adapting techniques such as tiling and loop parallelism, which are mainstream in the multi-threaded, cache-aware realization of gemm .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-05927-y