Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure
We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix...
Saved in:
Published in | High Performance Computing and Communications pp. 807 - 816 |
---|---|
Main Authors | , |
Format | Book Chapter Conference Proceeding |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2005
Springer |
Edition | 1ère éd |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A1 + A2 + ... + As, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format. A classical approach which stores A in a BCSR can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1× over no blocking, and as high as 1.8× over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage. |
---|---|
ISBN: | 9783540290315 3540290311 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/11557654_91 |