Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures, which have become widespread in recent years, especially for scientific applications. In shared memory architectures, the most common parallelization API is OpenMP. However, the introduction of c...

Full description

Saved in:

Bibliographic Details
Published in	International journal of parallel programming Vol. 48; no. 1; pp. 1 - 31
Main Authors	Harel, Re’em, Mosseri, Idan, Levin, Harel, Alon, Lee-or, Rusanovsky, Matan, Oren, Gal
Format	Journal Article
Language	English
Published	New York Springer US 01.02.2020 Springer Nature B.V
Subjects	Compilers Computer architecture Computer Science Memory management Multiprocessing Processor Architectures Software Engineering/Programming and Operating Systems Theory of Computation AutoPar Par4All Parallel programming Automatic parallelism Cetus
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures, which have become widespread in recent years, especially for scientific applications. In shared memory architectures, the most common parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel shared memory management pitfalls and architecture heterogeneity. To ease this process, many automatic parallelization compilers were created. In this paper we focus on three source-to-source compilers—AutoPar, Par4All and Cetus—which were found to be most suitable for the task, point out their strengths and weaknesses, analyze their performances, inspect their capabilities and suggest new paths for enhancement. We analyze and compare the compilers’ performances over several different exemplary test cases, with each test case pointing out different pitfalls, and suggest several new ways to overcome these pitfalls, while yielding excellent results in practice. Moreover, we note that all of those source-to-source parallelization compilers function in the limits of OpenMP 2.5—an outdated version of the API which is no longer in optimal accordance with nowadays complicated heterogeneous architectures. Therefore we suggest a path to exploit the new features of OpenMP 4.5, as it provides new directives to fully utilize heterogeneous architectures, specifically ones that have a strong collaboration between CPUs and GPGPUs, thus it outperforms previous results by an order of magnitude.
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-019-00640-3