When developing a numerical simulation code with high performance and efficiency in mind, one is often compelled to accept a trade-off between using a native-hardware programming model (like CUDA or OpenCL), which has become tremendously challenging, and loosing some cross-platform portability.
Porting a large existing legacy code to a modern HPC platform, and developing a new simulation code, are two different tasks that may be benefit from a high-level programming model, which abstracts the low-level hardware details.
This training presents existing high-level programming solutions that can preserve at best as possible performance, maintainability and portability across the vast diversity of modern hardware architectures (multicore CPU, manycore, GPU, ARM, ..) and software development productivity.
We will provide an introduction to the high-level C++ programming model Kokkos https://github.com/kokkos, and show basic code examples to illustrate the following concepts through hands-on sessions:
- hardware portability: design an algorithm once and let the Kokkos back-end (OpenMP, CUDA, ...) actually derive an efficient low-level implementation;
- efficient architecture-aware memory containers: what is a Kokkos::view;
- revisit fundamental parallel patterns with Kokkos: parallel for, reduce, scan, ... ;
- explore some mini-applications.
Several detailed examples in C/C++/Fortran will be used in hands-on session on the high-end hardware platform Jean Zay (http://www.idris.fr/jean-zay/), equipped with Nvidia Tesla V100 GPUs.
Some basic knowledge of the CUDA programming model and of C++.