Unite and conquer approach for high scale numerical computing
•We propose an approach to define numerical methods for exascale computational systems.•We propose an adapted programming model for these computing systems.•We point out the characteristics of such numerical methods.•We show that the methods defined according to the unite and conquer approach could...
Saved in:
Published in | Journal of computational science Vol. 14; pp. 5 - 14 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.05.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •We propose an approach to define numerical methods for exascale computational systems.•We propose an adapted programming model for these computing systems.•We point out the characteristics of such numerical methods.•We show that the methods defined according to the unite and conquer approach could achieve our objective.
The ability to exploit emerging exascale computational systems will require a careful review and redesign of core numerical algorithms and their implementations to fully exploit multiple levels of concurrency, hierarchical memory structures and heterogeneous processing units that will become available in these computational platforms. This paper presents the “unite and conquer” approach to solve linear systems of equations and eigenvalue problems for extreme scale computing. Indeed, there are two ways to optimize the execution of a restarted method on a large-scale distributed system. The first one is to optimize the number of floating point operations per restart cycle through maximizing the concurrency inside a restart cycle while minimizing latencies. The second way is to accelerate/improve the rate of convergence for a given computational scheme. The unite and conquer restarted approach focuses on decreasing the number of restart cycles by coupling either synchronously or asynchronously several restarted methods called also co-methods. In the end of a restart cycle, each co-method locally gathers available results of all collaborating co-methods and selects the best one in order to create its restarting information. Consequently this permits the global reduction of the number of cycles to convergence. The unite and conquer restarted methods are heterogeneous, fault tolerant, support asynchronous communications and present a big potential of load balancing. Due to these properties, they are well adapted to large-scale multi-level parallel architectures. We show the relevant programming paradigms that allow multi-level parallel expression of these methods and how the software engineering technology can contribute significantly in achieving high scalability. We present some experiments validating the approach for unite and conquer restarted Krylov methods on several parallel and distributed platforms. |
---|---|
ISSN: | 1877-7503 1877-7511 |
DOI: | 10.1016/j.jocs.2016.01.007 |