Comparison of Parallel Implementation Strategies in GPU-Accelerated System-on-Chip Under Proton Irradiation

Commercial off-the-shelf (COTS) system-on-chip (SoC) are becoming widespread in embedded systems. Many of them include a multicore central processing unit (CPU) and a high-end graphics processing unit (GPU). They combine high computational performance with low power consumption and flexible multilev...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on nuclear science Vol. 69; no. 3; pp. 444 - 452
Main Authors	Badia, Jose M., Leon, German, Belloch, Jose A., Garcia-Valderas, Mario, Lindoso, Almudena, Entrena, Luis
Format	Journal Article
Language	English
Published	New York IEEE 01.03.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Central processing units Commercial off-the-shelf technology Computer applications Computer architecture CPUs Embedded systems Errors graphics processing unit (GPU) Graphics processing units Instruction sets Irradiation Microprocessors Multiplication parallelization Performance evaluation Power consumption Proton irradiation Protons Radiation Radiation effects Sensitivity System on chip
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Commercial off-the-shelf (COTS) system-on-chip (SoC) are becoming widespread in embedded systems. Many of them include a multicore central processing unit (CPU) and a high-end graphics processing unit (GPU). They combine high computational performance with low power consumption and flexible multilevel parallelism. This kind of device is also being considered for radiation environments where large amounts of data must be processed or compute-intensive applications must be executed. In this article, we compare three different strategies to perform matrix multiplication in the GPU of a Tegra TK1 SoC. Our aim is to analyze how the different use of the resources of the GPU influences not only the computational performance of the algorithm, but also its radiation sensitivity. Radiation experiments with protons were performed to compare the behavior of the three strategies. Experimental results show that most of the errors force a reboot of the platform. The number of errors is directly related with how the algorithms use the internal memories of the GPU and increases with the matrix size. It is also related with the number of transactions with the global memory, which in our experiments is not affected by the radiation. Results show that the smallest cross section is obtained with the fastest algorithm, even if it uses the cores of the GPU more intensively.
ISSN:	0018-9499 1558-1578
DOI:	10.1109/TNS.2021.3128722