The Cache-Oblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization andExperimental Evaluation

We consider triply-nested loops of the type that occur in the standard Gaussian elimination algorithm, which we denote by GEP (or the Gaussian Elimination Paradigm). We present two related cache-oblivious methods I-GEP and C-GEP, both of which reduce the number of cache misses incurred (or I/Os perf...

Full description

Saved in:

Bibliographic Details
Published in	Theory of computing systems Vol. 47; no. 4; pp. 878 - 919
Main Authors	Chowdhury, Rezaul Alam, Ramachandran, Vijaya
Format	Journal Article
Language	English
Published	01.11.2010
Subjects	Algorithms Computation Gaussian elimination Optimization Shortest-path problems Tradeoffs Transformations
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider triply-nested loops of the type that occur in the standard Gaussian elimination algorithm, which we denote by GEP (or the Gaussian Elimination Paradigm). We present two related cache-oblivious methods I-GEP and C-GEP, both of which reduce the number of cache misses incurred (or I/Os performed) by the computation over that performed by standard GEP by a factor of M , where M is the size of the cache. Cache-oblivious I-GEP computes in-place and solves most of the known applications of GEP including Gaussian elimination and LU-decomposition without pivoting and Floyd-Warshall all-pairs shortest paths. Cache-oblivious C-GEP uses a modest amount of additional space, but is completely general and applies to any code in GEP form. Both I-GEP and C-GEP produce system-independent cache-efficient code, and are potentially applicable to being used by optimizing compilers for loop transformation. We present parallel I-GEP and C-GEP that achieve good speed-up and match the sequential caching performance cache-obliviously for both shared and distributed caches for sufficiently large inputs. We present extensive experimental results for both in-core and out-of-core performance of our algorithms. We consider both sequential and parallel implementations, and compare them with finely-tuned cache-aware BLAS code for matrix multiplication and Gaussian elimination without pivoting. Our results indicate that cache-oblivious GEP offers an attractive trade-off between efficiency and portability.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 content type line 23 ObjectType-Feature-1
ISSN:	1432-4350 1433-0490
DOI:	10.1007/s00224-010-9273-8