Vector data flow analysis for SIMD optimizations on OpenCL programs

Summary Multi‐core systems equipped with micro processing units and accelerators such as digital signal processors (DSPs) and graphics processing units (GPUs) have become a major trend in processor design in recent years in attempts to meet ever‐increasing application performance requirements. Open...

Full description

Saved in:
Bibliographic Details
Published inConcurrency and computation Vol. 28; no. 5; pp. 1629 - 1654
Main Authors Lin, Yu-Te, Lee, Jenq-Kuen
Format Journal Article
LanguageEnglish
Published Blackwell Publishing Ltd 10.04.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Summary Multi‐core systems equipped with micro processing units and accelerators such as digital signal processors (DSPs) and graphics processing units (GPUs) have become a major trend in processor design in recent years in attempts to meet ever‐increasing application performance requirements. Open Computing Language (OpenCL) is one of the programming languages that include new extensions proposed to exploit the computing power of these kinds of processors. Among the newly extended language features, the single‐instruction multiple‐data (SIMD) linguistics and vector types are added to OpenCL to exploit hardware features of the accelerators. The addition makes it necessary to consider how traditional compiler data flow analysis can be adopted to meet the optimization requirements of vector linguistics. In this paper, we propose a calculus framework to support the data flow analysis of vector constructs for OpenCL programs that compilers can use to perform SIMD optimizations. We model OpenCL vector operations as data access functions in the style of mathematical functions. We then show that the data flow analysis for OpenCL vector linguistics can be performed based on the data access functions. Based on the information gathered from data flow analysis, we illustrate a set of SIMD optimizations on OpenCL programs. The experimental results incorporating our calculus and our proposed compiler optimizations show that the proposed SIMD optimizations can provide average performance improvements of 22% on x86 CPUs and 4% on advanced micro devices GPUs. For the selected 15 benchmarks, 11 of them are improved on x86 CPUs, and six of them are improved on advanced micro devices GPUs. The proposed framework has the potential to be used to construct other SIMD optimizations on OpenCL programs. Copyright © 2015 John Wiley & Sons, Ltd.
Bibliography:This article is an extension of a conference paper presented at the 16th Workshop on Compilers for Parallel Computing .
ArticleID:CPE3714
istex:B139A8758D05CC8AB78C4E89E437F6EF7E0543D9
ark:/67375/WNG-LQR4QM68-Z
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3714