handsonopencl / exercises-solutions Goto Github PK

C, C++ and Python Code for Exercises and Solutions

License: Other

C 21.95% C++ 68.17% Makefile 2.20% Python 5.48% Cuda 1.30% Shell 0.02% Objective-C 0.88%

exercises-solutions's Introduction

Introduction

This set of freely available OpenCL exercises and solutions, together with the HandsOnOpenCL slides have been created by Simon McIntosh-Smith and Tom Deakin from the University of Bristol in the UK, with financial support from the Khronos Initiative for Training and Education (KITE) to promote the use of open standards.

Simon McIntosh-Smith is one of the foremost OpenCL trainers in the world, having taught the subject since 2009. He has run many OpenCL training courses at conferences such as SuperComputing and HiPEAC, and has provided OpenCL training for the UK's national supercomputing service and for the Barcelona Supercomputing Center. With OpenCL training experience ranging from half day on-site introductions within companies, to two-day intensive hands-on workshops for undergraduates, Simon can provide customized OpenCL training to meet your needs. Get in touch if you'd like to know more: .

For more about the authors, please visit Simon's home page or Tom's home page.

Source Code for the Exercises and Solutions

These examples together with the HandsOnOpenCL slides are released under the "attribution CC BY" creative commons license. In other words, you can use these in any way you see fit, including commercially, but please retain an attribution for the original authors, Simon McIntosh-Smith and Tom Deakin.

Getting started

Please download a tarball from Releases, or checkout the repository using git with the following command:

git clone git://github.com/HandsOnOpenCL/Exercises-Solutions.git

Found any issues or have some comments? Please submit a bug report in the Issue tab.

Pre-requisites

OpenCL 1.1 (or greater)
Python 2.7 (or greater)
C99 compiler (we use gcc) with OpenMP support (used for timing the runs [optional])
C++11 compiler (we use g++ or clang, also tested with Intel's icc)

Need help setting up OpenCL? Check out the first section in the lecture slides for information about setting up OpenCL on Linux for AMD (CPU, GPU, APU), Intel CPUs and NVIDIA GPUs.

Building

We assume here that your current working directory is the location of the source code; e.g. /path/to/Exercises-Solutions/Solutions/Exercise04/C

Python

Just run python source.py to run the code.

You must first run make to build the binary. We assume that your environment is set up to find the OpenCL library; if you have trouble try export CPATH=/path/to/OpenCL/include and export LD_LIBRARY_PATH=/path/to/OpenCL/lib.

You can also run make in the Examples/ and Solutions/ high-level directory; this calls all the sub-directory make files so all the examples can be built in one command. This also builds all the C++ examples.

Define the variable DEVICE in the Makefiles to be one of the OpenCL device types to vary the device type the C applications use. This can be done easily in the two global Makefiles found in the Exercises and Solutions directories. To use a GPU, for example, change the line DEVICE = CL_DEVICE_TYPE_DEFAULT to DEVICE=CL_DEVICE_TYPE_GPU.

Note: you can also edit each of the source files to use a specific device type, but we would recommend using the global Makefile method above.

Define the variable CC to change the C compiler used. By default, this is set to gcc for all platforms.

C++

You must first run make to build the binary. We assume that your environment is set up to find the OpenCL library.

Define the variable DEVICE in the Makefiles to be one of the OpenCL device types to vary the device type the C++ applications use. This can be done easily in the two global Makefiles found in the Exercises and Solutions directories. To use a GPU, for example, change the line DEVICE = CL_DEVICE_TYPE_DEFAULT to DEVICE=CL_DEVICE_TYPE_GPU.

Note: you can also edit each of the source files to use a specific device type, but we would recommend using the global Makefile method above.

Define the variable CPPC to change the C compiler used. By default, this is set to g++ on Linux, and clang++ on OS X.

Directory structure

The Exercises directory contains all the code needed to be handed out at the start of the tutorial for the exercises to be completed.

The Solutions directory contains sample code providing an example implementation which solves the exercises in the lecture notes.

Within both of the Exercises and Solutions directories, there is one subdirectory per exercise. Within each exercise subdirectory, there are further subdirectories for each implementation: C, C++ and Python.

exercises-solutions's People

Contributors

Stargazers

Watchers

Forkers

aestey alfiememo jakebolewski awolfmann bmerry azuredsky massinissalounis raoulchartreuse yusufameri ominiavincit v3c70r quantummechanist benelgar xgitty shayanc an4 holygeneralk nivertech chayao2015 brijesh68kumar ssouyris pritish4 jrprice jimlaimun quantscientist3 byshiny jghoman didwardfrenkel cloudswenable thelac phoenixstar7 abhimahatu123 linan7788626 ahmed-azri matzfan calufrax jeorme jamesadamhughes gallagth cirosantilli jamielikeschickens jeffheifetz jdanecki biomining pranavcode umangparekh niklas-peter robodhruv chiachun richom ceandrews95 zackishome koneko096 kartikaygarg archenroot charudatta10 profcab jimmysitu pablitos92 natlang silviu-at perp azatsman slongofono gauthamicme aromazyl fr33dz bhupesh19 toggled c00lrain xuexianwu hlspolito benjins blebo jdschmitz sivagnanamn eshnil2000 siddhart92 gpuworld juansalmeronmoya iamkevinzhao ballber type-a dimap8889 zbie akshit-sharma dailyactie krishnaw14 ptran1999 pratikkulkarni228 sangkwun morphbc dellytaken mgniew bourboncreams nineaxe jjasoliya fejiso j105rob xianjunzhengbackup

exercises-solutions's Issues

Warning with float initialization

When float variable are initialized with a constant I have some opencl warning :
"/tmp/OCL9395T11.cl", line 19: warning: double-precision constant is
represented as single-precision constant because double is not
enabled
tmp = 0.0;
^

warn(text, CompilerWarning)

A change of all 0.0 with 0.0f will remove this kind of warning.
I know this is a very minor issue, would you like that I make the change.

Python improvements

Suggestions from Andreas Kloeckner, creator of PyOpenCL:

From a brief look at the slides, the only feedback would be that

prg.kernel(...)

reexecutes clCreateKernel() on every launch, so storing a reference to
the kernel may be more efficient. In addition, the issue of having to
cast arguments to numpy types can be alleviated by

http://documen.tician.de/pyopencl/runtime.html#pyopencl.Kernel.set_scalar_arg_dtypes

I'm not suggesting that you include this information (it might well be
that you left it out on purpose), I'm just trying to make sure you're
aware of it. :)

Matrix Multiply C code for solution 6

Saved as TODO

Pi C code for Solution 8

Saved as TODO

Python solution of Exercise08 prb with C_block_form.cl part

I add no problem to launch most of the example, but the solution of the exercice9 I get this error message :
===== OpenCL, A and B in block form in local memory, order 1024 ======

Traceback (most recent call last):
File "matmul.py", line 187, in
d_a, d_b, d_c, localmem1, localmem2)
File "/usr/local/lib/python2.7/dist-packages/pyopencl/init.py", line 466, in kernel_call
global_offset, wait_for, g_times_l=g_times_l)
pyopencl.LogicError: clEnqueueNDRangeKernel failed: invalid work group size

Need to update top-level README with instructions for building C++

The top level README describes how to build and run the C and Python exercises and solutions, but doesn't mention the C++ ones. Need to add this.

Gameoflife example won't build on Mac OS X

If you use a recent Xcode on Mac OS X, it won't build gameoflife from Exercise13:

cc gameoflife.c -O3 -std=c99 -o gameoflife
gameoflife.c:102:5: error: second parameter of 'main' (argument array) must be of type 'char *_'
int main(int argc, void *_argv)
^

This is with the following version of the tools:

$ cc --version
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.5.0
Thread model: posix

The fix is obvious.

C++ Versions of Exercise 06 and 07 (inc. Solutions) to add

Saved as TODO

Tests for Game of Life

Saved as a TODO.

We should include a sanity checking test suite for the game of life - provide some simple inputs and outputs and check the final states. This will be helpful when completing the exercise too.

Python version of Exercise06 still running 20 minute host version

Needs commenting out, as we did with the solution.

Platform Information in C

Saved as TODO

Change compiler for OS X

Saved as TODO.

You need to add '-stdlib=libc++' to the compiler flags when building for C++11 with clang++, so I guess this should be added to the Makefiles for OS X.
I was building with 'make CPPC=clang++', but I guess you could also have the Makefiles automatically select clang++ if on OS X
Clang doesn't support OpenMP, so problem for Exercise06

Matrix Multiply C code for solution 7

Saved as TODO

Matrix Multiply C code for exercise 6

Saved as TODO

Pi serial C code for Exercise 8

Saved as TODO

Python version of Exercise08 much slower than C version

On an Nvidia M2050 and a fast Nehalem host, the C code takes about 0.9s while the Python version takes about 63s. These should ideally take an almost identical amount of time.

Differences between Exercise06 solution in slides and code

In what is now slide 82, we list the solution for Exercise06, where the student should have written their own kernel for the first time by converting the sequential C code into a simple matrix multiply kernel.

The solution in the slides has a body that looks like this:

{
int k;
int i = get_global_id(0);
int j = get_global_id(1);
float tmp = 0.0f;
for (k = 0; k < Pdim; k++)
tmp += A[i_Ndim+k] * B[k_Pdim+j];
}
C[i*Ndim+j] += tmp;
}

Whereas in the sequential C code solution provided in source form inside matrix_lib.c, its body looks like this:

for (i=0; i<Ndim; i++){
    for (j=0; j<Mdim; j++){
        tmp = 0.0;
        for(k=0;k<Pdim;k++){
             /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
             tmp += *(A+(i*Ndim+k)) *  *(B+(k*Pdim+j));
         }
         *(C+(i*Ndim+j)) = tmp;
      }
}

This is a very different style of array addressing and could confuse the students. We should change the sequential C code inside matrix_lib.c in both the Exercise and the Solution so that the body looks like this:

for (i=0; i<Ndim; i++) {
    for (j=0; j<Mdim; j++) {
        tmp = 0.0f;
        for (k=0; k<Pdim; k++) {
             /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
             tmp += A[i*Ndim+k] * B[k*Pdim+j];
         }
         C[i*Ndim+j] += tmp;
      }
}

Notice I've also added a few spaces inside the "for" statements, and also changed the definition of tmp to 0.0f from 0.0 (just good practise!).

Note that in the actual OpenCL kernel solution for Exercise06 the code is as we would want it, i.e. consistent with the above, apart from the 0.0 also needs to be changed to 0.0f.

C helper function has incomplete list of error numbers

The C helper function we provide, int err_code (cl_int err_in) in err_code.c, has an incomplete list of error codes it will recognise. In particular, it doesn't know about CL_DEVICE_NOT_FOUND, which is quite an important one.

This has already bitten me when one solution code expected a GPU, but my MBA doesn't expose one.

It would be worth updating the list in err_code() against the latest OpenCL v1.1 header file and making it a complete set.

In fact, a simple script that would take the appropriate chunk from cl.h and turn it into err_code() would be useful as we migrate this to support v1.2 and 2.0 etc.

Can we make it easier to use Mac OSX?

For the Exercises and Solutions, it doesn't take much to get them all compiled and running on a Mac. All we have to do is modify two lines in the Makefiles from something that looks like this:

CCFLAGS=-O3 -lm -std=c99 -ffast-math

LIBS = -fopenmp -lOpenCL

To:

CCFLAGS=-O3 -lm -std=c99 -ffast-math -DAPPLE

LIBS = -fopenmp -framework OpenCL

There are two main ways we could do this:

Use a condition inside the Makefile itself that looks for APPLE
Use a make.def which we modify for each platform.

For previous versions of the course we used 2) with great effect, and I still have make.def files for Nvidia, AMD, Intel and Mac OSX.

Inconsistent use of timers

In the C++ code, some examples (matmul) use wtime() and some examples (pi_ocl) use the util::Timer.

They should all probably be consistent with themselves.

Check for OS X defined APPLE preprocessor define

Might be a problem with PI on CPU

The pi_vocl program doen’t work. The kernel enqueue throws an exception, especially when trying to run on a CPU.

Nicer error output for C++

Saved as TODO.

C has the error printed out, but C++ has the number. This isn't very helpful!

Solution for Exercise06 assumes a GPU in C, but anything in Python

Just trying the solutions on my Apple Macbook Air. After changing the Makefiles to use -framework OpenCL and -DAPPLE, they compile OK. But the C code assumes it will find a GPU in the following code:

// Set up OpenCL context. queue, kernel, etc.
cl_uint numPlatforms;
// Find number of platforms
err = clGetPlatformIDs(0, NULL, &numPlatforms);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
    printf("Error: Failed to find a platform!\n",err_code(err));
    return EXIT_FAILURE;
}
// Get all platforms
cl_platform_id Platform[numPlatforms];
err = clGetPlatformIDs(numPlatforms, Platform, NULL);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
    printf("Error: Failed to get the platform!\n",err_code(err));
    return EXIT_FAILURE;
}
// Secure a device
for (int i = 0; i < numPlatforms; i++)
{
    err = clGetDeviceIDs(Platform[i], DEVICE, 1, &device_id, NULL);
    if (err == CL_SUCCESS)
        break;
}
if (device_id == NULL)
{
    printf("Error: Failed to create a device group!\n",err_code(err));
    return EXIT_FAILURE;
}

DEVICE is defined in matmul.h to be CL_DEVICE_TYPE_GPU.

This means the program exits with "Error: Failed to create a device group!".

Whereas the Python solution assumes any valid OpenCL device.

So, what do we want this to do? Make it CL_DEVICE_TYPE_DEFAULT in the C code?

Error output for C doesn't use the err_code.c file

C VADD kernel for exercise 2

Saved as TODO

Matrix multiply host code is too complicated

Provide a matrix multiply host code to just run a single kernel, rather than multiple kernels along with the serial version.

Need to add something on which license we're using

Need to make sure the license for use of this material is clear. We're going for creative commons, I think this one (most open):

http://creativecommons.org/licenses/by/3.0/

Global make clean doesn't work as expected

Header file dependencies are missing from some Makefiles

I just noticed that not all the dependencies on header files are correctly captured in the Makefiles. This can lead to some erroneous behaviour when recompiling. The matrix multiply example and solution is one specific set of examples that suffers from this bug.

The err_code() function wouldn't compile

In the latest version, the err_code() function contained a bug where the variable err_in was misnamed as err_int at the end of the function.

Include final_state.dat for Game of Life examples

Python for Exercise 2 missing

There's no Python directory or solution for Exercise 2, should there be?

C chained vadd for solution 4

Saved as TODO

Problem with -fopenmp flag not being recognised with Xcode 5

Apple has moved to using the LLVM framework by default for Xcode 5, from gcc in previous versions of Xcode. The new Xcode doesn't have support for OpenMP by default, and so the -fopenmp flag in many of the Makefiles causes an error on Mac OS X.

This flag isn't strictly needed, so the fix is simply to remove it, especially on Mac OS X platforms.

Top-level Makefile chooses wrong compiler on Mac OS X

On Mac OS X we want to make sure we always use CLANG so that we get the C++11 support we need for the C++ wrapper, cl.hpp. But the top-level Makefiles set CC to gcc even on Mac OS X. Need to fix this.

Global make file for the C code

Saved as TODO.

C code for D=A+B+C for solution 5

Saved as TODO

C++ global makefile

Saved as TODO

Exercise 06 Python solution code takes too long on the host

When running the Python solution code on Blue Crystal, the initial CPU code is so slow, it feels like it's hung. For example, on my Nehalem test machine (a GPU node in Blue Crystal phase 1), I get:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======

1256.22704506 seconds at 1.70947095626 MFLOPS

20 minutes is a long time to wait, especially when the C version only takes about 10 seconds on the same machine:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
10.31 seconds at 208.2 MFLOPS

I think this is too long to wait, users will think something it wrong.

Either we need to make the Python faster on the CPU, or leave the CPU version commented out by default!

Exercise 6 C and Cpp solutions are incorrect

If I run make ; ./mult in Solutions/Exercise06/C or Solutions/Exercise06/Cpp I get the following output:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
 7.67 seconds at 279.9 MFLOPS 

===== OpenCL, matrix mult, C(i,j) per work item, order 1024 ======
 5.01 seconds at 428.9 MFLOPS 

 Errors in multiplication: 168394460495872.000000

This is the output from the C executable, although the Cpp one gives similar results.

Am I correct in thinking that the error should be somewhat smaller?
Is this a known bug?

I'm running OS X 10.9.5, Core i7, Intel HD Graphics 4000, NVIDIA GeForce GT 650M 1024 MB.
I believe the OpenCL kernel runs on the GeForce in these examples.

Template code for Exercise04 missing

Exercise04 is completely empty. I think we need to put some template code here, possibly even the solution to Exercise 2/3.

No Python host code for Exercise 6

Saved as a TODO

Unclear Game of Life examples exist

Update the README

Might need to add std:: prefix to isnan() call in matrix_lib.cpp

A potential bug report from a user:

"I had to make a change to matrix_lib.cpp.

 if(isnan(errsq) || ...

I had to add "std::"

 if(std::isnan(errsq) || …"

They were using Mac OS X 10.7 "Lion" with gcc 4.8:

"g++-mp-4.8 -std=c++11"

handsonopencl / exercises-solutions Goto Github PK

exercises-solutions's Introduction

Introduction

Source Code for the Exercises and Solutions

Getting started

Pre-requisites

Building

Directory structure

exercises-solutions's People

Contributors

Stargazers

Watchers

Forkers

exercises-solutions's Issues

Recommend Projects

Recommend Topics

Recommend Org