Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix...

Full description

Saved in:
Bibliographic Details
Published inHigh Performance Computing and Communications pp. 807 - 816
Main Authors Vuduc, Richard W., Moon, Hyun-Jin
Format Book Chapter Conference Proceeding
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2005
Springer
Edition1ère éd
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A1 + A2 + ... + As, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format. A classical approach which stores A in a BCSR can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1× over no blocking, and as high as 1.8× over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage.
ISBN:9783540290315
3540290311
ISSN:0302-9743
1611-3349
DOI:10.1007/11557654_91