Code Monkey home page Code Monkey logo

archer-gpu-course's Introduction





Introduction to GPU programming with CUDA/HIP

CC BY-NC-SA 4.0

This short course will provide an introduction to GPU computing with CUDA aimed at scientific application programmers wishing to develop their own software. The course will give a background on the difference between CPU and GPU architectures as a prelude to introductory exercises in CUDA programming. The course will discuss the execution of kernels, memory management, and shared memory operations. Common performance issues are discussed and their solution addressed. Profiling will be introduced via the current NVIDIA tools.

The course will go on to consider execution of independent streams, and the execution of work composed as a collection of dependent tasks expressed as a graph. Device management and details of device to device data transfer will be covered for situations where more than one GPU device is available. CUDA-aware MPI will be covered.

The course will not discuss programming with compiler directives, but does provide a concrete basis of understanding of the underlying principles of the CUDA model which is useful for programmers ultimately wishing to make use of OpenMP or OpenACC (or indeed other models). The course will not consider graphics programming, nor will it consider machine learning packages.

Note that the course is also appropriate for those wishing to use AMD GPUs via the HIP API, although we will not specifically use HIP.

Attendees must be able to program in C or C++ (course examples and exercises will limit themselves to C). A familiarity with threaded programming models would be useful, but no previous knowledge of GPU programming is required.

Installation

For details of how to log into a Cirrus account, see https://cirrus.readthedocs.io/en/main/user-guide/connecting.html

Check out the git repository to your Cirrus account.

$ cd ${HOME/home/work}
$ https://github.com/EPCCed/archer-gpu-course.git
$ cd archer-gpu-course

For the examples and exercises in the course, we will use the NVIDIA compiler driver nvcc. To access this

$ module load nvidia/nvhpc

Check you can compile and run a very simple program and submit the associated script to the queue system.

$ cd section-2.01
$ nvcc -arch=sm_70 exercise_dscal.cu
$ sbatch submit.sh

The result should appear in a file slurm-123456.out in the working directory.

Each section of the course is associated with a different directory, each of which contains a number of example programs and exercise templates. Answers to exercises generally re-appear as templates to later exercises. Miscellaneous solutions also appear in the solutions directory.

Timetable

The timetable may shift slightly in terms of content, but we will stick to the advertised start and finish times, and the break times.

Day one

Time Content Section
09:30 Logistics, login, modules, local details See above
10:00 Introduction
Performance model; Graphics processors section-1.01
10:30 The CUDA/HIP programming model
Abstraction; host code and device code section-1.02
11:00 Break
11:30 CUDA/HIP programming: memory managenent
cudaMaloc(), cudaMmecpy() section-2.01
12:15 Executing a kernel
__global__ functions <<<...>>> section-2.02
13:00 Lunch
14:00 Some performance considerations
Exercise on matrix operation section-2.03
15:00 Break
15:20 Managed memory
Exercise on managed memory section-2.04
15:50 Shared memory
16:10 Exercise on vector product section-2.05
16:30 Constant memory
16:40 All together: matrix-vector product section-2.06
17:00 Close

Day two

Time Content Section
09:00 Profiling: Nsight systems and compute
09:10 Using nsys and ncu section-3.01
09:30 Streams
Using cudaMempcyAsync() etc section-4.01
10:00 Graph API
Using cudaGraphLaunch() etc section-4.02
11:00 Break
11:30 Device management: more then one GPU
cudaMemcpy() again section-5.01
12:15 Extra topic: GPU-aware MPI
Exercise section-5.02
13:00 Lunch
14:00 Putting it all together
Conjugate gradient exercise section-6.01
15:00 Break
15:20 Exercises
15:50 Miscellaneous comments section-7.01
16:00 Close

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

archer-gpu-course's People

Contributors

dependabot[bot] avatar juanfrh avatar kevinstratford avatar nb-epcc avatar roliveira avatar rupertnash avatar timspainucl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archer-gpu-course's Issues

Proposed two-day CUDA course

Course overview:

This short course will provide an introduction to GPU computing with CUDA aimed at scientific application programmers wishing to develop their own software. The course will give a background on the difference between CPU and GPU architectures as a prelude to introductory exercises in CUDA programming. The course will discuss the execution of kernels, memory management, and shared memory operations. Common performance issues are discussed and their solution addressed. Profiling will be introduced via the current NVIDIA tools.

The course will go on to consider execution of independent streams, and the execution of work composed as a collection of dependent tasks expressed as a graph. Device management and details of device to device data transfer will be covering for situations where more than one GPU device is available. CUDA-aware MPI will be covered.

The course will not discuss programming with compiler directives, but does provide a concrete basis of understanding of the underlying principles of the CUDA model which is useful for programmers ultimately wishing to make use of OpenMP or OpenACC. The course will not consider graphics programming, nor will it
consider machine learning packages.

Note that the course is also appropriate for those wishing to use AMD GPUs via the HIP API, although we will not specifically use HIP.

Attendees must be able to program in C or C++ (course examples and
exercises will limit themselves to C). A familiarity with threaded programming models would be useful, but no previous knowledge of GPU programming is required.

Draft timetable:

D A Y  O N E
============

09:30 - 10:00  Logistics and logging in
10:00 - 10:30  Introduction and GPU concepts/architectures
10:30 - 11:00  The CUDA/HIP programming model

11:00 - 11:30  Break
11:30 - 12:00  CUDA programming: kernels
12:00 - 13:00  A first CUDA exercise: operation on a vector

13:00 - 14:00  Lunch

14:00 - 14:30  Programming: memory considerations
14:30 - 15:30  Exercise: operation on a matrix

15:00 - 15:20  Break

15:20 - 15:45  Unified/managed memory
15:45 - 16:00  Exercise: managed memory
16:00 - 16:20  Threaded programming and synchronisation
16:20 - 17:00  Exercise: Reduction for vector product
17:00          Close


D A Y  T W O
============

09:00 - 09:10  Detour: using profiling
09:10 - 10:00  Exercise: nsight and nsight systems
10:00 - 10:40  Device Management
               The idea of streams; its extension to CUDA Graph API
10:40 - 11:00  Exercise: graph API

11:00 - 11:30  Break
11:30 - 12:00  More than one GPU: GPU to GPU transfers
               GPU aware MPI
12:00 - 13:00  Exercises (cont.)

13:00 - 14:00  Lunch

14:00 - 14:10  Put it all together: Conjugate gradient (CG) algorithm
14:10 - 15:00  CG exercise

15:00 - 15:20  Break
15:20 - 15:50  CG exercise (cont.)
15:50 - 16:00  Some miscellaneous observations
16:00          Close

LaTeX math formulas aren't rendered on Firefox

Hi, in the first and second exercises (cuda-intro and cuda-optimise), the inline math ($...$) and display math (<script type="math/tex; mode=display">...</script>) are not rendered correctly using:

  • Firefox 71 and 72 on Ubuntu (me)
  • Firefox on Windows (my neighbor)

The inline math is left as is and the display math is not rendered at all, see attached screenshots. Is this Firefox-specific? Are we supposed to install some extension to render this code?

Screenshot_2020-01-09 Introduction A first CUDA program
Screenshot_2020-01-09 Introduction Optimsing a CUDA application

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.