Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors

This paper presents comprehensive analysis of main SIMD-processing features and computational characteristics of three high performance architectures: two NVIDIA GPU architectures (of Pascal and Volta generations) and NEC SX-Aurora TSUBASA vector processor. Since both these types of architectures st...

Full description

Saved in:
Bibliographic Details
Published inParallel Computing Technologies pp. 125 - 139
Main Authors Afanasyev, Ilya V., Voevodin, Vadim V., Voevodin, Vladimir V., Komatsu, Kazuhiko, Kobayashi, Hiroaki
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents comprehensive analysis of main SIMD-processing features and computational characteristics of three high performance architectures: two NVIDIA GPU architectures (of Pascal and Volta generations) and NEC SX-Aurora TSUBASA vector processor. Since both these types of architectures strongly rely on using SIMD-processing features, certain similarities of data-processing principles can be found between them. However, despite having vectorised data-processing included in both NVIDIA GPU and NEC SX-Aurora TSUBASA architectures, vectorisation features of both architectures are implemented in completely different ways. These differences lead to several fundamental restrictions on classes of algorithms which can be efficiently implemented on corresponding platforms. This paper is devoted to the research of the possibility of porting various classes of programs and algorithms among the discussed architectures with a focus on utilising all vectorisation features available. However, without a detailed analysis of similar and different SIMD-processing features in these architectures, it is impossible to approach this problem. The performed analysis allowed us to identify several important examples of typical applications and algorithms. Some of them demonstrated comparable and the others showed different efficiency on NVIDIA GPUs and NEC SX-Aurora TSUBASA vector processors, including reduction operations, programs relying on frequent indirect memory accesses and data-transfers through co-processor interconnect. Moreover, the conducted analysis allows to easily extend this set of examples to approach the problem of automated porting of programs between the reviewed architectures, what we consider as an important direction of our future research.
ISBN:3030256359
9783030256357
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-25636-4_10