This workshop targets researchers and developers who already know the basics of OpenACC and/or CUDA but would like to expand their knowledge. It will build on the ENCCS workshop “Introduction to OpenACC/CUDA” given in May (https://enccs.se/events/2021/05/openacccuda-training-for-beginners/).
The workshop will consist of lectures, type-alongs, and hands-on sessions. Lectures will present the OpenACC framework with three key steps in porting to high-performance accelerated codes: analysis, parallelization, and optimization.
CUDA lectures will cover two main topics: how to optimize computational kernels for efficient execution on GPU hardware, and how to explore task-based parallelism using streams and events. We will also briefly go through profiling tools that can help one to identify the computational bottlenecks of a program.
Participants are assumed to have knowledge of the C programming language. Since participants will be using an HPC cluster to run the examples, familiarity with Linux/Unix environments is assumed.
If you would like to attend this workshop but don’t have much prior experience with OpenACC and CUDA, we recommend that you carefully go through the lesson material of our introductory course, which is available at https://enccs.github.io/OpenACC-CUDA-beginners/.
This course is for students, researchers, engineers, and programmers who would like to expand their knowledge of OpenACC and CUDA. Some previous experience with C/C++ is required, basic knowledge of OpenACC/CUDA will help to follow the material.
The workshop furthermore assumes that participants have some familiarity with logging in to supercomputers and using a bash terminal, and with compiling C/C++ or Fortran codes using compilers and makefiles.
Day 1 – Monday 28 June 2021
|09:00-09:10||Introduction to ENCCS|
|09:10-09:30||Introduction to GPUs|
|09:30-09:50||OpenACC: Analysis and Parallelization|
|11:30-12:30||CUDA: Parallel reduction use-case|
Day 2 – Tuesday 29 June 2021
|09:00-09:10||Follow-ups from day 1|
|09:10-10:00||CUDA: Optimizing the reduction kernel|
|10:30-11:20||CUDA: Exploring task-based parallelizm: streams and events|
|11:50-12:20||CUDA: Notes on profiling tools|