Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE XXVI International Conference on Electronics, Electrical Engineering and Computing (INTERCON) pp. 1 - 4
Main Authors	Baca, Herwin Alayn Huillcen, de Luz Palomino Valdivia, Flor
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2019
Subjects	Arrays CSR Data transfer GPU Graphics processing units Kernel latency Mathematical model Overlap Data Transfer Performance evaluation Pinned Memory Sparse matrices SpMV
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have tried to optimize the performance of SpMV using storage formats other than CSR (Compressed Storage Row), experienced extra time in the conversion between formats. we propose to optimize the performance of SpMV by reducing the latency of copying data between host and device, so we present CSR-Async, a new program that takes into account CSR-Vector for the kernel code in GPU and uses pinned memory for host vectors and makes asynchronous copies form host to device and vice verse making use of non-default streams and overlap data transfer. CSR-Async has better performance than CSR-Vector and CSR-Scalar, since it is 2.26 and 1.73 times faster respectively.
DOI:	10.1109/INTERCON.2019.8853624