Code Monkey home page Code Monkey logo

exercises-solutions's Introduction

Introduction

This set of freely available OpenCL exercises and solutions, together with the HandsOnOpenCL slides have been created by Simon McIntosh-Smith and Tom Deakin from the University of Bristol in the UK, with financial support from the Khronos Initiative for Training and Education (KITE) to promote the use of open standards.

Simon McIntosh-Smith is one of the foremost OpenCL trainers in the world, having taught the subject since 2009. He has run many OpenCL training courses at conferences such as SuperComputing and HiPEAC, and has provided OpenCL training for the UK's national supercomputing service and for the Barcelona Supercomputing Center. With OpenCL training experience ranging from half day on-site introductions within companies, to two-day intensive hands-on workshops for undergraduates, Simon can provide customized OpenCL training to meet your needs. Get in touch if you'd like to know more: .

For more about the authors, please visit Simon's home page or Tom's home page.

Source Code for the Exercises and Solutions

These examples together with the HandsOnOpenCL slides are released under the "attribution CC BY" creative commons license. In other words, you can use these in any way you see fit, including commercially, but please retain an attribution for the original authors, Simon McIntosh-Smith and Tom Deakin.

Getting started

Please download a tarball from Releases, or checkout the repository using git with the following command:

git clone git://github.com/HandsOnOpenCL/Exercises-Solutions.git

Found any issues or have some comments? Please submit a bug report in the Issue tab.

Pre-requisites

  • OpenCL 1.1 (or greater)
  • Python 2.7 (or greater)
  • C99 compiler (we use gcc) with OpenMP support (used for timing the runs [optional])
  • C++11 compiler (we use g++ or clang, also tested with Intel's icc)

Need help setting up OpenCL? Check out the first section in the lecture slides for information about setting up OpenCL on Linux for AMD (CPU, GPU, APU), Intel CPUs and NVIDIA GPUs.

Building

We assume here that your current working directory is the location of the source code; e.g. /path/to/Exercises-Solutions/Solutions/Exercise04/C

Python

Just run python source.py to run the code.

C

You must first run make to build the binary. We assume that your environment is set up to find the OpenCL library; if you have trouble try export CPATH=/path/to/OpenCL/include and export LD_LIBRARY_PATH=/path/to/OpenCL/lib.

You can also run make in the Examples/ and Solutions/ high-level directory; this calls all the sub-directory make files so all the examples can be built in one command. This also builds all the C++ examples.

Define the variable DEVICE in the Makefiles to be one of the OpenCL device types to vary the device type the C applications use. This can be done easily in the two global Makefiles found in the Exercises and Solutions directories. To use a GPU, for example, change the line DEVICE = CL_DEVICE_TYPE_DEFAULT to DEVICE=CL_DEVICE_TYPE_GPU.

Note: you can also edit each of the source files to use a specific device type, but we would recommend using the global Makefile method above.

Define the variable CC to change the C compiler used. By default, this is set to gcc for all platforms.

C++

You must first run make to build the binary. We assume that your environment is set up to find the OpenCL library.

You can also run make in the Examples/ and Solutions/ high-level directory; this calls all the sub-directory make files so all the examples can be built in one command. This also builds all the C examples.

Define the variable DEVICE in the Makefiles to be one of the OpenCL device types to vary the device type the C++ applications use. This can be done easily in the two global Makefiles found in the Exercises and Solutions directories. To use a GPU, for example, change the line DEVICE = CL_DEVICE_TYPE_DEFAULT to DEVICE=CL_DEVICE_TYPE_GPU.

Note: you can also edit each of the source files to use a specific device type, but we would recommend using the global Makefile method above.

Define the variable CPPC to change the C compiler used. By default, this is set to g++ on Linux, and clang++ on OS X.

Directory structure

The Exercises directory contains all the code needed to be handed out at the start of the tutorial for the exercises to be completed.

The Solutions directory contains sample code providing an example implementation which solves the exercises in the lecture notes.

Within both of the Exercises and Solutions directories, there is one subdirectory per exercise. Within each exercise subdirectory, there are further subdirectories for each implementation: C, C++ and Python.

exercises-solutions's People

Contributors

agitoz avatar benelgar avatar bmerry avatar gallagth avatar jrprice avatar raoulchartreuse avatar simonmcs avatar tomdeakin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exercises-solutions's Issues

Warning with float initialization

When float variable are initialized with a constant I have some opencl warning :
"/tmp/OCL9395T11.cl", line 19: warning: double-precision constant is
represented as single-precision constant because double is not
enabled
tmp = 0.0;
^

warn(text, CompilerWarning)

A change of all 0.0 with 0.0f will remove this kind of warning.
I know this is a very minor issue, would you like that I make the change.

Python improvements

Suggestions from Andreas Kloeckner, creator of PyOpenCL:

From a brief look at the slides, the only feedback would be that

prg.kernel(...)

reexecutes clCreateKernel() on every launch, so storing a reference to
the kernel may be more efficient. In addition, the issue of having to
cast arguments to numpy types can be alleviated by

http://documen.tician.de/pyopencl/runtime.html#pyopencl.Kernel.set_scalar_arg_dtypes

I'm not suggesting that you include this information (it might well be
that you left it out on purpose), I'm just trying to make sure you're
aware of it. :)

Python solution of Exercise08 prb with C_block_form.cl part

I add no problem to launch most of the example, but the solution of the exercice9 I get this error message :
===== OpenCL, A and B in block form in local memory, order 1024 ======

Traceback (most recent call last):
File "matmul.py", line 187, in
d_a, d_b, d_c, localmem1, localmem2)
File "/usr/local/lib/python2.7/dist-packages/pyopencl/init.py", line 466, in kernel_call
global_offset, wait_for, g_times_l=g_times_l)
pyopencl.LogicError: clEnqueueNDRangeKernel failed: invalid work group size

Gameoflife example won't build on Mac OS X

If you use a recent Xcode on Mac OS X, it won't build gameoflife from Exercise13:

cc gameoflife.c -O3 -std=c99 -o gameoflife
gameoflife.c:102:5: error: second parameter of 'main' (argument array) must be of type 'char *_'
int main(int argc, void *_argv)
^

This is with the following version of the tools:

$ cc --version
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.5.0
Thread model: posix

The fix is obvious.

Tests for Game of Life

Saved as a TODO.

We should include a sanity checking test suite for the game of life - provide some simple inputs and outputs and check the final states. This will be helpful when completing the exercise too.

Change compiler for OS X

Saved as TODO.

  1. You need to add '-stdlib=libc++' to the compiler flags when building for C++11 with clang++, so I guess this should be added to the Makefiles for OS X.
  2. I was building with 'make CPPC=clang++', but I guess you could also have the Makefiles automatically select clang++ if on OS X
  3. Clang doesn't support OpenMP, so problem for Exercise06

Differences between Exercise06 solution in slides and code

In what is now slide 82, we list the solution for Exercise06, where the student should have written their own kernel for the first time by converting the sequential C code into a simple matrix multiply kernel.

The solution in the slides has a body that looks like this:

{
int k;
int i = get_global_id(0);
int j = get_global_id(1);
float tmp = 0.0f;
for (k = 0; k < Pdim; k++)
tmp += A[i_Ndim+k] * B[k_Pdim+j];
}
C[i*Ndim+j] += tmp;
}

Whereas in the sequential C code solution provided in source form inside matrix_lib.c, its body looks like this:

for (i=0; i<Ndim; i++){
    for (j=0; j<Mdim; j++){
        tmp = 0.0;
        for(k=0;k<Pdim;k++){
             /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
             tmp += *(A+(i*Ndim+k)) *  *(B+(k*Pdim+j));
         }
         *(C+(i*Ndim+j)) = tmp;
      }
}

This is a very different style of array addressing and could confuse the students. We should change the sequential C code inside matrix_lib.c in both the Exercise and the Solution so that the body looks like this:

for (i=0; i<Ndim; i++) {
    for (j=0; j<Mdim; j++) {
        tmp = 0.0f;
        for (k=0; k<Pdim; k++) {
             /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
             tmp += A[i*Ndim+k] * B[k*Pdim+j];
         }
         C[i*Ndim+j] += tmp;
      }
}

Notice I've also added a few spaces inside the "for" statements, and also changed the definition of tmp to 0.0f from 0.0 (just good practise!).

Note that in the actual OpenCL kernel solution for Exercise06 the code is as we would want it, i.e. consistent with the above, apart from the 0.0 also needs to be changed to 0.0f.

C helper function has incomplete list of error numbers

The C helper function we provide, int err_code (cl_int err_in) in err_code.c, has an incomplete list of error codes it will recognise. In particular, it doesn't know about CL_DEVICE_NOT_FOUND, which is quite an important one.

This has already bitten me when one solution code expected a GPU, but my MBA doesn't expose one.

It would be worth updating the list in err_code() against the latest OpenCL v1.1 header file and making it a complete set.

In fact, a simple script that would take the appropriate chunk from cl.h and turn it into err_code() would be useful as we migrate this to support v1.2 and 2.0 etc.

Can we make it easier to use Mac OSX?

For the Exercises and Solutions, it doesn't take much to get them all compiled and running on a Mac. All we have to do is modify two lines in the Makefiles from something that looks like this:

CCFLAGS=-O3 -lm -std=c99 -ffast-math

LIBS = -fopenmp -lOpenCL

To:

CCFLAGS=-O3 -lm -std=c99 -ffast-math -DAPPLE

LIBS = -fopenmp -framework OpenCL

There are two main ways we could do this:

  1. Use a condition inside the Makefile itself that looks for APPLE

  2. Use a make.def which we modify for each platform.

For previous versions of the course we used 2) with great effect, and I still have make.def files for Nvidia, AMD, Intel and Mac OSX.

Inconsistent use of timers

In the C++ code, some examples (matmul) use wtime() and some examples (pi_ocl) use the util::Timer.

They should all probably be consistent with themselves.

Solution for Exercise06 assumes a GPU in C, but anything in Python

Just trying the solutions on my Apple Macbook Air. After changing the Makefiles to use -framework OpenCL and -DAPPLE, they compile OK. But the C code assumes it will find a GPU in the following code:

// Set up OpenCL context. queue, kernel, etc.
cl_uint numPlatforms;
// Find number of platforms
err = clGetPlatformIDs(0, NULL, &numPlatforms);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
    printf("Error: Failed to find a platform!\n",err_code(err));
    return EXIT_FAILURE;
}
// Get all platforms
cl_platform_id Platform[numPlatforms];
err = clGetPlatformIDs(numPlatforms, Platform, NULL);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
    printf("Error: Failed to get the platform!\n",err_code(err));
    return EXIT_FAILURE;
}
// Secure a device
for (int i = 0; i < numPlatforms; i++)
{
    err = clGetDeviceIDs(Platform[i], DEVICE, 1, &device_id, NULL);
    if (err == CL_SUCCESS)
        break;
}
if (device_id == NULL)
{
    printf("Error: Failed to create a device group!\n",err_code(err));
    return EXIT_FAILURE;
}

DEVICE is defined in matmul.h to be CL_DEVICE_TYPE_GPU.

This means the program exits with "Error: Failed to create a device group!".

Whereas the Python solution assumes any valid OpenCL device.

So, what do we want this to do? Make it CL_DEVICE_TYPE_DEFAULT in the C code?

Header file dependencies are missing from some Makefiles

I just noticed that not all the dependencies on header files are correctly captured in the Makefiles. This can lead to some erroneous behaviour when recompiling. The matrix multiply example and solution is one specific set of examples that suffers from this bug.

Problem with -fopenmp flag not being recognised with Xcode 5

Apple has moved to using the LLVM framework by default for Xcode 5, from gcc in previous versions of Xcode. The new Xcode doesn't have support for OpenMP by default, and so the -fopenmp flag in many of the Makefiles causes an error on Mac OS X.

This flag isn't strictly needed, so the fix is simply to remove it, especially on Mac OS X platforms.

Exercise 06 Python solution code takes too long on the host

When running the Python solution code on Blue Crystal, the initial CPU code is so slow, it feels like it's hung. For example, on my Nehalem test machine (a GPU node in Blue Crystal phase 1), I get:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======

1256.22704506 seconds at 1.70947095626 MFLOPS

20 minutes is a long time to wait, especially when the C version only takes about 10 seconds on the same machine:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
10.31 seconds at 208.2 MFLOPS

I think this is too long to wait, users will think something it wrong.

Either we need to make the Python faster on the CPU, or leave the CPU version commented out by default!

Exercise 6 C and Cpp solutions are incorrect

If I run make ; ./mult in Solutions/Exercise06/C or Solutions/Exercise06/Cpp I get the following output:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
 7.67 seconds at 279.9 MFLOPS 

===== OpenCL, matrix mult, C(i,j) per work item, order 1024 ======
 5.01 seconds at 428.9 MFLOPS 

 Errors in multiplication: 168394460495872.000000

This is the output from the C executable, although the Cpp one gives similar results.

Am I correct in thinking that the error should be somewhat smaller?
Is this a known bug?

I'm running OS X 10.9.5, Core i7, Intel HD Graphics 4000, NVIDIA GeForce GT 650M 1024 MB.
I believe the OpenCL kernel runs on the GeForce in these examples.

C++ timer not working?

The timer in Cpp_common/util.cpp might not work on some Apple systems. Trying to use the C++ timer program on some Mac OS X laptops can give absurd times (186302452924.23423 seconds for a simple vadd, for example).

Exercise 06 matrix mul doesn't report error

The exercise 6 code with the kernel deleted doesn't produce an error when the buffer is returned. As no kernel is running the buffer shouldn't have the correct result in!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.