Logically Parallel Communication for Fast MPI+Threads Applications

Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional "MPI everywhere" approach to better handle the disproportionate increase in the number of cores compared with other on-node resources. In practice, however, most applications observ...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 32; no. 12; pp. 3038 - 3052
Main Authors Zambre, Rohit, Sahasrabudhe, Damodar, Zhou, Hui, Berzins, Martin, Chandramowlishwaran, Aparna, Balaji, Pavan
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1045-9219
1558-2183
DOI10.1109/TPDS.2021.3075157

Cover

Loading…
More Information
Summary:Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional "MPI everywhere" approach to better handle the disproportionate increase in the number of cores compared with other on-node resources. In practice, however, most applications observe a slower performance with MPI+threads primarily because of poor communication performance. Recent research efforts on MPI libraries address this bottleneck by mapping logically parallel communication, that is, operations that are not subject to MPI's ordering constraints to the underlying network parallelism. Domain scientists, however, typically do not expose such communication independence information because the existing MPI-3.1 standard's semantics can be limiting. Researchers had initially proposed user-visible endpoints to combat this issue, but such a solution requires intrusive changes to the standard (new APIs). The upcoming MPI-4.0 standard, on the other hand, allows applications to relax unneeded semantics and provides them with many opportunities to express logical communication parallelism. In this article, we show how MPI+threads applications can achieve high performance with logically parallel communication. Through application case studies, we compare the capabilities of the new MPI-4.0 standard with those of the existing one and user-visible endpoints (upper bound). Logical communication parallelism can boost the overall performance of an application by over 2×.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
AC02-06CH11357; 1750549
USDOE Office of Science (SC)
National Science Foundation (NSF)
University of Utah
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2021.3075157