Parallel Algorithms for Successive Convolution

The development of modern computing architectures with ever-increasing amounts of parallelism has allowed for the solution of previously intractable problems across a variety of scientific disciplines. Despite these advances, multiscale computing problems continue to pose an incredible challenge to...

Full description

Saved in:

Bibliographic Details
Published in	Journal of scientific computing Vol. 86; no. 1; p. 1
Main Authors	Christlieb, Andrew J., Guthrey, Pierson T., Sands, William A., Thavappiragasm, Mathialakan
Format	Journal Article
Language	English
Published	New York Springer US 2021 Springer Nature B.V
Subjects	Algorithms Approximation Boundary conditions Computation Computational Mathematics and Numerical Analysis Decomposition Distributed memory Domain decomposition methods Integral equations Iterative methods Linear systems Mathematical and Computational Engineering Mathematical and Computational Physics Mathematics Mathematics and Statistics Methods Operators (mathematics) Optimization Partial differential equations Theoretical Integral solution Numerical analysis High-performance computing Domain decomposition Method-of-lines-transpose Fast algorithms
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The development of modern computing architectures with ever-increasing amounts of parallelism has allowed for the solution of previously intractable problems across a variety of scientific disciplines. Despite these advances, multiscale computing problems continue to pose an incredible challenge to modern architectures because they require resolving scales that often vary by orders of magnitude in both space and time. Such complications have led us to consider alternative discretizations for partial differential equations (PDEs) which use expansions involving integral operators to approximate spatial derivatives (Christlieb et al. in J Comput Phys 379:214–236, 2019; Christlieb et al. J Sci Comput 82:52(3):1–29, 2020; Christlieb et al. J Comput Phys 415:1–25, 2020). These constructions use explicit information within the integral terms, but treat boundary data implicitly, which contributes to the overall speed of the method. This approach is provably unconditionally stable for linear problems and stability has been demonstrated experimentally for nonlinear problems. Additionally, it is matrix-free in the sense that it is not necessary to invert linear systems and iteration is not required for nonlinear terms. Moreover, the scheme employs a fast summation algorithm that yields a method with a computational complexity of O ( N ) , where N is the number of mesh points along a coordinate direction. While much work has been done to explore the theory behind these methods, their practicality in large scale computing environments is a largely unexplored topic. In this work, we explore the performance of these methods by developing a domain decomposition algorithm suitable for distributed memory systems along with shared memory algorithms. As a first pass, we derive an artificial Courant–Friedrichs–Lewy condition that enforces a nearest-neighbor (N-N) communication pattern and briefly discuss possible generalizations. We also analyze several approaches for implementing the parallel algorithms by optimizing predominant loop structures and maximizing data reuse. Using a hybrid design that employs MPI and Kokkos (Edwards and Trott in J Parallel Distrib Comput 74:3202–3216, 2014) for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate > 1 × 10 8 DOF/node/s. We provide results that demonstrate the scalability and versatility of our algorithms using several different PDE test problems, including a nonlinear example, which employs an adaptive time-stepping rule.
ISSN:	0885-7474 1573-7691
DOI:	10.1007/s10915-020-01359-x