Efficient compiler and run-time support for parallel irregular reductions

Many scientific applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buffers, then combined using synchronization. We develop L ocalW rite, a new technique which parti...

Full description

Saved in:
Bibliographic Details
Published inParallel computing Vol. 26; no. 13; pp. 1861 - 1887
Main Authors Han, Hwansoo, Tseng, Chau-Wen
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2000
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many scientific applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buffers, then combined using synchronization. We develop L ocalW rite, a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, eliminating the need for buffers or synchronized writes. Computation is replicated if its results are needed on multiple processors. We experimentally evaluate its performance for three irregular codes on a software DSM running on a distributed-memory multiprocessor and two shared-memory multiprocessors while varying connectivity, locality, and adaptivity. Results show L ocalW rite improves performance significantly compared to using replicated buffers, and can match or exceed explicit message-passing gather/scatter for applications with low locality or high adaptivity.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0167-8191
1872-7336
DOI:10.1016/S0167-8191(00)00062-4