Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Book Review: OpenCL Programming Guide 40

asgard4 writes "In recent years GPUs have become powerful computing devices whose power is not only used to generate pretty graphics on screen but also to perform heavy computation jobs that were exclusively reserved for high performance super computers in the past. Considering the vast diversity and rapid development cycle of GPUs from different vendors, it is not surprising that the ecosystem of programming environments has flourished fairly quickly as well, with multiple vendors, such as NVIDIA, AMD, and Microsoft, all coming up with their own solutions on how to program GPUs for more general purpose computing (also abbreviated GPGPU) applications. With OpenCL (short for Open Computing Language) the Khronos Group provides an industry standard for programming heavily parallel, heterogeneous systems with a language to write so-called kernels in a C-like language. The OpenCL Programming Guide gives you all the necessary knowledge to get started developing high-performing, parallel applications for such systems with OpenCL 1.1." Keep reading for the rest of asgard4's review.
OpenCL Programming Guide
author Aaftab Munshi, Benedict R. Gaster, Timothy G. Mattson, James Fung, Dan Ginsbur
pages 603
publisher Addison-Wesley Pearson Educatio
rating 9/10
reviewer asgard4
ISBN 0321749642
summary A solid introduction to programming with OpenCL.
The authors of the book certainly know what they are talking about. Most of them have been involved in the standardization effort that went into OpenCL. Munshi, for example, is the editor of the OpenCL specification. So all the information in the book is first-hand knowledge from experts in OpenCL. The reader is expected to be familiar with the C programming language and basic programming concepts. Some experience in parallelizing problems is a benefit but not a requirement.

The book consist of two major parts. The first part is a detailed description of the OpenCL C language and the API used by the host to control the execution of programs written in that language. The second part is comprised of various case studies that show OpenCL in action.
The authors get straight to the point in the introduction, discussing the conceptual foundations of OpenCL in detail. They explain what kernels are (basically functions that are scheduled for execution on a compute device), how the kernel execution model works, how the host manages the command queues that schedule memory transfers or kernel execution on compute devices, and the memory model.

While this first chapter is all prose, the second chapter dives right in with some code and a first HelloWorld example. The following chapters introduce more and more of the OpenCL language and API step-by-step. All API functions are described in somewhat of a reference style with a lot of detail, including possible error codes. However, the text is not a reference. There is always a good explanation with examples or short code listings, the only notable exception being chapter three, which presents the OpenCL C language. A few more examples would have made the text less dry in this chapter.

An important chapter is chapter nine on events and synchronization between multiple compute devices and the host. This chapter is important because — as any experienced parallel programmer knows — getting synchronization right is often tricky but obviously essential for correct execution of a parallel program.

An interesting feature in OpenCL is the built-in interoperability with OpenGL and, surprisingly, Direct3D. Various functions in the OpenCL API allow creating buffers from OpenGL/Direct3D objects, such as textures or vertex buffers, that can be used by an OpenCL kernel. This opens up interesting possibilities for doing a lot more work on the GPU in graphics applications, such as running a fluid simulation on the GPU in OpenCL, which directly writes its results into vertex buffers or textures to be used directly for rendering without the host CPU having to intervene.

Before delving into the case studies the book briefly discusses the embedded profile that is available for OpenCL and the standardized C++ API that the Khronos Group provides in addition to the regular OpenCL API (which is defined exclusively as C functions). The C++ API makes using some of the OpenCL objects a little bit easier and somewhat nicer.

The second part of the book contains various interesting case studies that show off what OpenCL can be used for, such as computing a sobel filter or a histogram for an image, computing FFTs, doing cloth simulation, or multiplying dense and sparse matrices. The choice and variety of case studies is definitely interesting and most will be immediately applicable to the reader when going forward developing applications using OpenCL. All the code for the examples and the case studies in the book are available for download on the book's website.

Overall, the OpenCL Programming Guide succeeds in being a great introduction to OpenCL 1.1. The book covers all of the specification and more, has an easy to read writing style and yet provides all the necessary details to be an all-encompassing guide to OpenCL. The good selection of case studies makes the book even more appealing and demonstrates what can be done with real-life OpenCL code (and also how it needs to be optimized to get the best performance out of current OpenCL platforms, such as GPUs).

Martin Ecker has been involved in real-time graphics programming for more than 15 years and works as a professional game developer for Sony Computer Entertainment America in sunny San Diego, California.

You can purchase OpenCL Programming Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.


This discussion has been archived. No new comments can be posted.

Book Review: OpenCL Programming Guide

Comments Filter:
  • Great (Score:5, Informative)

    by WilyCoder ( 736280 ) on Friday January 20, 2012 @07:07PM (#38768930)

    I read this book back in August. I've been using OpenGL for almost 10 years now but knew little to nothing about OpenCL.

    This book was really good. There were some typos that I found while reading it (other people had already found and reported them). If you get this book make sure you visit the author's addendum & corrections page.

    I agree with the review, 9/10. If there were NO typos at all, it would be 10/10 for me.

  • Re:Ordinary Mortals (Score:4, Informative)

    by UnknownSoldier ( 67820 ) on Friday January 20, 2012 @07:31PM (#38769226)

    Why don't you start with ShaderToy ?
    http://www.iquilezles.org/apps/shadertoy/ [iquilezles.org]

    And some interesting code snippets ...
    http://www.reddit.com/r/programming/comments/losip/shader_toy/ [reddit.com]

    Reddit is the Dig of /. -- group herd-think, circle jerking, wankers, and the rare insightful / informative comment.

  • Re:Ordinary Mortals (Score:5, Informative)

    by Anonymous Coward on Friday January 20, 2012 @08:52PM (#38770286)


    CPU's are a assembly line, if you have a quadcore system, you have 4 assembly lines, and they may be very long. Those 4 assembly lines don't get to talk to each other except on either end. They can all be doing the same activity, or a different activity, and operate asynchronously. When they finish what they are doing, they wipe out the assembly line.
    GPU's are syncronous and parallel. Every assembly line in a GPU can only do the same instruction code until cleared. So if there are 2048 assembly lines, each of those do the same instructions, with different pieces of data.

    So in principle, if you can't parallelize it (eg zlib), it is better run on the CPU. If it can be parallelized (image, video and sound compression, FFT, specific math functions) you can run it on the GPU.

    What we haven't done yet is discovered any lossless parallelizeable compression schemes. The problem is that the more fragments you break it up into, the less compression you can do because compression is purely serial. Lossy compression however is not serial, you can go "here's a 64x64 block of data, compress it", and it will do that on the entire image at once, because those 64x64 blocks don't rely on the compression of any of the other blocks in the image. The compression code may be a simple XOR or a Motion vector with the previous image. It can't rely on the neighboring 64x64 blocks.

    This is why you see "accelerated" video tear. Because it doesn't wait for all the fragments in the frame to complete before flipping the video buffer. Adobe Flash is especially guilty of this, where you'll see on dual core and quadcore CPU systems screen tearing because Flash assumes it has 100% use of the CPU, even though that same CPU is doing other stuff. If flash used the GPU, it would suffer the same problem since the GPU still is used in Windows Vista and 7 in the accelerated composited desktop.

    Anyway. CPU programming and GPU programming are completely different animals.

    One thing that GPU's have high potential for, is independent computations. For example, back in 1992, if you were playing a game, the game could only compute the NPC's that are just off the screen. Today, you could use the GPU to compute all the NPC's positions simultaneously. This is currently done with physics computations. Not simply doing "AI" on the GPU, but actually creating neural networks for many NPC's to react to the Playing Character, not just simple "is PC visible, shoot it."

  • Re:Ordinary Mortals (Score:5, Informative)

    by bored ( 40072 ) on Friday January 20, 2012 @09:27PM (#38770666)

    Coding for it in OpenCL isn't much different than writing C code that is just a wrapper around some assembly. There is no reason a MUCH more human friendly interface couldn't be made with the compiler taking care of using the appropriate memory and instructions to optimize for GPU usage.

    As someone who has actually done some OpenCL programming, I can tell you why your wrong. Learning openCL syntax isn't hard, if you know C# you can probably write some useful openCL code in just an hour or two. It is after all, a C-like language just like C# is a C like language.

    That said, don't expect your openCL code to run faster than similar C code compiled with SSE. Thats because making OpenCL run fast is an exercise is looking at memory access patterns, understanding how to share data between hundreds of threads efficiently, etc. My first openCL program was actually slower (by 1/2) than a similar program using all 8 cores of my CPU. I got it on par with the CPU using a top of the line AMD GPU within a day or so, and then spent another two weeks trying different things until finally finding the magic bullet which removed a memory collision I was having and by itself increased the performance of my routine by ~32x. Running the same code on an nvidia GPU put me back in the ballpark of my CPUs again, requiring more time to make it fast on those GPUs. Time I wasn't willing to spend.

    The bottom line is that OpenCL could be any language, but, what is necessarily is the ability to make changes which affect how data is laid out in memory, and how that data is being read/written. Furthermore, you need the ability to specify where the memory is used, because GPU's have unforgiving memory hierarchy. So if your not comfortable with the nitty gritty details of how computers (or in this case GPUs) actually work (not some CompSci abstraction) your not going to write good OpenCL code. You also need a gut feeling for how fast something could be, based on the specification of a particular device. Otherwise you won't know when to give up.

  • by bored ( 40072 ) on Friday January 20, 2012 @09:38PM (#38770764)

    Is that its not really useful for learning OpenCL. Sure it will teach you the syntax and how to write an OpenCL program. That isn't the problem. The problem is that if your writing something in OpenCL you probably want it to be fast. Learning the language is doable by someone with C experience in just a couple hours with just the SDKs shipped by AMD/Nividia/Intel. Learning how to optimize a routine for a particular GPU/etc is the hard part, and is application specific. It also requires knowledge of how compute device actually work at an extremely low level. I don't believe this book teaches that. Save your money, download the spec and a SDK for your device. Start reading the architecture docs..

Heuristics are bug ridden by definition. If they didn't have bugs, then they'd be algorithms.