Using OpenCL programming massively parallel computers

In 2011 many computer users were exploring the opportunities and the benefits of the massive parallelism offered by heterogeneous computing. In 2000 the Khronos Group, a not-for-profit industry consortium, was founded to create standard open APIs for parallel computing, graphics and dynamic media. A...

Full description

Saved in:

Bibliographic Details
Main Authors	Kowalik, Janusz S, Puzniakowski, Tadeusz
Format	eBook Book
Language	English
Published	Amsterdam IOS Press 2012 SAGE Publications, Limited
Edition	1
Subjects	OpenCL (Computer program language) Parallel computers Parallel programming (Computer science)
Online Access	Get full text
ISBN	1614990298 9781614990291
DOI	10.3233/978-1-61499-030-7-i

Cover

Table of Contents:

Getting Information about Context -- OpenCL Context to Manage Devices - C++ -- Error Handling -- Checking Error Codes -- Using Exceptions - Available in C++ -- Using Custom Error Messages -- Command Queues -- In-order Command Queue -- Out-of-order Command Queue -- Command Queue Control -- Profiling Basics -- Profiling Using Events - C example -- Profiling Using Events - C++ example -- Work-Items and Work-Groups -- Information About Index Space from a Kernel -- NDRange Kernel Execution -- Task Execution -- Using Work Offset -- OpenCL Memory -- Different Memory Regions - the Kernel Perspective -- Relaxed Memory Consistency -- Global and Constant Memory Allocation - Host Code -- Memory Transfers - the Host Code -- Programming and Calling Kernel -- Loading and Compilation of an OpenCL Program -- Kernel Invocation and Arguments -- Kernel Declaration -- Supported Scalar Data Types -- Vector Data Types and Common Functions -- Synchronization Functions -- Counting Parallel Sum -- Parallel Sum - Kernel -- Parallel Sum - Host Program -- Structure of the OpenCL Host Program -- Initialization -- Preparation of OpenCL Programs -- Using Binary OpenCL Programs -- Computation -- Release of Resources -- Structure of OpenCL host Programs in C++ -- Initialization -- Preparation of OpenCL Programs -- Using Binary OpenCL Programs -- Computation -- Release of Resources -- The SAXPY Example -- Kernel -- The Example SAXPY Application - C Language -- The example SAXPY application - C++ language -- Step by Step Conversion of an Ordinary C Program to OpenCL -- Sequential Version -- OpenCL Initialization -- Data Allocation on the Device -- Sequential Function to OpenCL Kernel -- Loading and Executing a Kernel -- Gathering Results -- Matrix by Vector Multiplication Example -- The Program Calculating matrix times vector -- Performance -- Experiment -- Conclusions
Title Page -- Preface -- Contents -- Introduction -- Existing Standard Parallel Programming Systems -- MPI -- OpenMP -- Two Parallelization Strategies: Data Parallelism and Task Parallelism -- Data Parallelism -- Task Parallelism -- Example -- History and Goals of OpenCL -- Origins of Using GPU in General Purpose Computing -- Short History of OpenCL -- Heterogeneous Computer Memories and Data Transfer -- Heterogeneous Computer Memories -- Data Transfer -- The Fourth Generation CUDA -- Host Code -- Phase a. Initialization and Creating Context -- Phase b. Kernel Creation, Compilation and Preparations for Kernel Execution -- Phase c. Creating Command Queues and Kernel Execution -- Finalization and Releasing Resource -- Applications of Heterogeneous Computing -- Accelerating Scientific/Engineering Applications -- Conjugate Gradient Method -- Jacobi Method -- Power Method -- Monte Carlo Methods -- Conclusions -- Benchmarking CGM -- Introduction -- Additional CGM Description -- Heterogeneous Machine -- Algorithm Implementation and Timing Results -- Conclusions -- OpenCL Fundamentals -- OpenCL Overview -- What is OpenCL -- CPU + Accelerators -- Massive Parallelism Idea -- Work Items and Workgroups -- OpenCL Execution Model -- OpenCL Memory Structure -- OpenCL C Language for Programming Kernels -- Queues, Events and Context -- Host Program and Kernel -- Data Parallelism in OpenCL -- Task Parallelism in OpenCL -- How to Start Using OpenCL -- Header Files -- Libraries -- Compilation -- Platforms and Devices -- OpenCL Platform Properties -- Devices Provided by Platform -- OpenCL Platforms - C++ -- OpenCL Context to Manage Devices -- Different Types of Devices -- CPU Device Type -- GPU Device Type -- Accelerator -- Different Device Types - Summary -- Context Initialization - by Device Type -- Context Initialization - Selecting Particular Device
Advanced OpenCL -- OpenCL Extensions -- Different Classes of Extensions -- Detecting Available Extensions from API -- Using Runtime Extension Functions -- Using Extensions from OpenCL Program -- Debugging OpenCL codes -- Printf -- Using GDB -- Performance and Double Precision -- Floating Point Arithmetics -- Arithmetics Precision - Practical Approach -- Profiling OpenCL Application -- Using the Internal Profiler -- Using External Profiler -- Effective Use of Memories - Memory Access Patterns -- Matrix Multiplication - Optimization Issues -- OpenCL and OpenGL -- Extensions Used -- Libraries -- Header Files -- Common Actions -- OpenGL Initialization -- OpenCL Initialization -- Creating Buffer for OpenGL and OpenCL -- Kernel -- Generating Effect -- Running Kernel that Operates on Shared Buffer -- Results Display -- Message Handling -- Cleanup -- Notes and Further Reading -- Case Study - Genetic Algorithm -- Historical Notes -- Terminology -- Genetic Algorithm -- Example Problem Definition -- Genetic Algorithm Implementation Overview -- OpenCL Program -- Most Important Elements of Host Code -- Summary -- Experiment Results -- Comparing CUDA with OpenCL -- Introduction to CUDA -- Short CUDA Overview -- CUDA 4.0 Release and Compatibility -- CUDA Versions and Device Capability -- CUDA Runtime API Example -- CUDA Program Explained -- Blocks and Threads Indexing Formulas -- Runtime Error Handling -- CUDA Driver API Example -- Theoretical Foundations of Heterogeneous Computing -- Parallel Computer Architectures -- Clusters and SMP -- DSM and ccNUMA -- Parallel Chip Computer -- Performance of OpenCL Programs -- Combining MPI with OpenCL -- Matrix Multiplication - Algorithm and Implementation -- Matrix Multiplication -- Implementation -- OpenCL Kernel -- Initialization and Setup -- Kernel Arguments -- Executing Kernel -- Using Examples Attached to the Book
Compilation and Setup -- Linux -- Windows -- Bibliography and References