GPU Programming Workshop @ EPCC



Graphics Processing Units (GPUs) were originally designed to display computer graphics, but they have developed into extremely powerful chips capable of handling demanding, general-purpose calculations. The GPU architecture is inherently is more suited to many types of intensive parallel computations than the traditional CPU, and hence computationally demanding sections of code can be accelerated to significantly increase overall performance. This is true not just for small-scale applications run on desktop size machines, but also for the largest-scale applications on massively parallel architectures. For example, the newly announced Cray XK6 supercomputer allows thousands of NVIDIA GPUs to be exploited in parallel to tackle grand challenge problems. Applications must be adapted to utilise GPUs: most lines of application source code are executed on the CPU and key computational kernels are distributed to the GPU cores. Currently, for NVIDIA GPUs, the most popular programming method is the CUDA API, which is extremely powerful but requires significant development effort. OpenCL is an alternative API, which is less mature than CUDA but has portability advantages. Recently, a new higher-level standard has emerged, OpenACC, which promises to offer higher productivity. The programmer uses “directives” in the code to provide the compiler with the information required to automatically offload code to the GPU. In this 3-day course we will introduce and provide hands-on experience of CUDA, OpenCL (with more emphasis on the former) and OpenACC. In many cases it is relatively straightforward to port a code to the GPU, but much harder to obtain good performance: we will cover a range of common GPU optimisation techniques. No prior HPC or parallel programming knowledge is assumed, but attendees must already be able to program in C, C++ or Fortran. Access will be given to appropriate hardware for all the exercises. This course is free to all academics. Pre-requisite Programming Languages Fortran, C or C++. It is not possible to complete the exercises in Java. Practical Templates and Documentation: CUDA and OpenCL Example Timetable Day 1 09:00 Registration 09:30 Lecture: Introduction and GPU Architecture 10:15 Lecture: Programming with CUDA 11:00 break 11:30 Practical: Getting started with CUDA 12:30 lunch 13:30 Lecture: GPU Optimisation 14:00 Practical: Optimising a CUDA application 15:00 break 15:30 Case study: Scaling an Application to a Thousand GPUs and Beyond 16:30 close Day 2 09:00 Lecture: Programming with OpenCL 09:45 Practical: OpenCL programming *or* continue CUDA practical 11:00 break 11:30 Practical (cont.) 12:15 lunch 13:00 OpenACC Welcome and overview 13:15 OpenACC Session 1: An Introduction to OpenACC 13:15 Lecture: The OpenACC programming model 14:15 Practical: compiling and running a sample OpenACC code 14:45 break 15:15 OpenACC Session 2: Accelerating a simple code 15:15 Worked example: OpenACC-ing a simple code 15:45 Practical: accelerating the simple code 16:30 close Day 3 09:00 OpenACC Session 3: Accelerating a larger code 09:00 Lecture: Preparing to OpenACC a code 09:45 Worked example: OpenACC-ing a larger code 10:15 Practical: preparing and accelerating a larger application 10:45 break 11:15 Nvidia Roadmap Update (Timothy Lanfear) 12:30 lunch 13:30 OpenACC Session 4: Improving OpenACC performance 13:30 Lecture: OpenACC performance tuning and interoperability 14:15 Practical: continuing to accelerate a larger code 15:00 break 15:30 OpenACC Session 6: OpenACC for parallel applications 15:30 Case study: the parallel Multigrid and Himeno codes 16:15 Summary and outlook 16:30 close
The agenda of this meeting is empty