Technical Report on Hypergraph-Partitioning-Based Models and Methods for Exploiting Cache Locality in Sparse-Matrix Vector Multiplication
The sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used in iterative linear solvers. The same sparse matrix is multiplied by a dense vector repeatedly in these solvers. Matrices with irregular sparsity patterns make it difficult to utilize cache locality effectively in SpMx...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
17.02.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The sparse matrix-vector multiplication (SpMxV) is a kernel operation widely
used in iterative linear solvers. The same sparse matrix is multiplied by a
dense vector repeatedly in these solvers. Matrices with irregular sparsity
patterns make it difficult to utilize cache locality effectively in SpMxV
computations. In this work, we investigate single- and multiple-SpMxV
frameworks for exploiting cache locality in SpMxV computations. For the
single-SpMxV framework, we propose two cache-size-aware top-down
row/column-reordering methods based on 1D and 2D sparse matrix partitioning by
utilizing the column-net and enhancing the row-column-net hypergraph models of
sparse matrices. The multiple-SpMxV framework depends on splitting a given
matrix into a sum of multiple nonzero-disjoint matrices so that the SpMxV
operation is performed as a sequence of multiple input- and output- dependent
SpMxV operations. For an effective matrix splitting required in this framework,
we propose a cache- size-aware top-down approach based on 2D sparse matrix
partitioning by utilizing the row-column-net hypergraph model. For this
framework, we also propose two methods for effective ordering of individual
SpMxV operations. The primary objective in all of the three methods is to
maximize the exploitation of temporal locality. We evaluate the validity of our
models and methods on a wide range of sparse matrices using both cache-miss
simulations and actual runs by using OSKI. Experimental results show that
proposed methods and models outperform state-of-the-art schemes. |
---|---|
Bibliography: | BU-CE-1201 |
DOI: | 10.48550/arxiv.1202.3856 |