Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE XXVI International Conference on Electronics, Electrical Engineering and Computing (INTERCON) pp. 1 - 4
Main Authors Baca, Herwin Alayn Huillcen, de Luz Palomino Valdivia, Flor
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have tried to optimize the performance of SpMV using storage formats other than CSR (Compressed Storage Row), experienced extra time in the conversion between formats. we propose to optimize the performance of SpMV by reducing the latency of copying data between host and device, so we present CSR-Async, a new program that takes into account CSR-Vector for the kernel code in GPU and uses pinned memory for host vectors and makes asynchronous copies form host to device and vice verse making use of non-default streams and overlap data transfer. CSR-Async has better performance than CSR-Vector and CSR-Scalar, since it is 2.26 and 1.73 times faster respectively.
DOI:10.1109/INTERCON.2019.8853624