From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming
Du, Peng, Weber, Rick, Luszczek, Piotr, Tomov, Stanimire, Peterson, Gregory, Dongarra, Jack
Published in Parallel computing (01.08.2012)
Published in Parallel computing (01.08.2012)
Get full text
Journal Article
Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD
Lu, Yuechao, Yamazaki, Ichitaro, Ino, Fumihiko, Matsushita, Yasuyuki, Tomov, Stanimire, Dongarra, Jack
Published in Concurrency and computation (10.10.2020)
Published in Concurrency and computation (10.10.2020)
Get full text
Journal Article
Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs
Yamazaki, Ichitaro, Tomov, Stanimire, Dong, Tingxing, Dongarra, Jack
Published in High Performance Computing for Computational Science -- VECPAR 2014
Published in High Performance Computing for Computational Science -- VECPAR 2014
Get full text
Book Chapter
Non‐GPU‐resident symmetric indefinite factorization
Yamazaki, Ichitaro, Tomov, Stanimire, Dongarra, Jack
Published in Concurrency and computation (10.03.2017)
Published in Concurrency and computation (10.03.2017)
Get full text
Journal Article
Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices
Dong, Tingxing, Haidar, Azzam, Tomov, Stanimire, Dongarra, Jack
Published in Procedia computer science (2017)
Published in Procedia computer science (2017)
Get full text
Journal Article
Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware
Sorna, Anumeena, Cheng, Xiaohe, D'Azevedo, Eduardo, Won, Kwai, Tomov, Stanimire
Published in 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW) (01.12.2018)
Published in 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW) (01.12.2018)
Get full text
Conference Proceeding
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications
Luszczek, Piotr, Abdelfattah, Ahmad, Anzt, Hartwig, Suzuki, Atsushi, Tomov, Stanimire
Published in Future generation computer systems (01.11.2024)
Published in Future generation computer systems (01.11.2024)
Get full text
Journal Article
Solving Linear Diophantine Systems on Parallel Architectures
Zaitsev, Dmitry, Tomov, Stanimire, Dongarra, Jack
Published in IEEE transactions on parallel and distributed systems (01.05.2019)
Published in IEEE transactions on parallel and distributed systems (01.05.2019)
Get full text
Journal Article
Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation
Ayala, Alan, Tomov, Stanimire, Luo, Xi, Shaeik, Hejer, Haidar, Azzam, Bosilca, George, Dongarra, Jack
Published in 2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI) (01.11.2019)
Published in 2019 IEEE/ACM Workshop on Exascale MPI (ExaMPI) (01.11.2019)
Get full text
Conference Proceeding
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
Haidar, Azzam, Bayraktar, Harun, Tomov, Stanimire, Dongarra, Jack, Higham, Nicholas J.
Published in Proceedings of the Royal Society. A, Mathematical, physical, and engineering sciences (01.11.2020)
Published in Proceedings of the Royal Society. A, Mathematical, physical, and engineering sciences (01.11.2020)
Get full text
Journal Article