[POSTPONED] Performance portability for GPU application using high-level programming approaches with Kokkos @ MdlS/Idris

CET
Idris

Idris

Campus universitaire d'Orsay rue John Von Neumann Bâtiment 506 F-91403 Orsay cedex
Description

When developing a numerical simulation code with high performance and efficiency in mind, one is often compelled to accept a trade-off between using a native-hardware programming model (like CUDA or OpenCL), which has become tremendously challenging, and loosing some cross-platform portability.

Porting a large existing legacy code to a modern HPC platform, and developing a new simulation code, are two different tasks that may be benefit from a high-level programming model, which abstracts the low-level hardware details.

This training presents existing high-level programming solutions that can preserve at best as possible performance, maintainability and portability across the vast diversity of modern hardware architectures (multicore CPU, manycore, GPU, ARM, ..) and software development productivity.

We will  provide an introduction to the high-level C++ programming model Kokkos https://github.com/kokkos, and show basic code examples  to illustrate the following concepts through hands-on sessions:

  • hardware portability: design an algorithm once and let the Kokkos back-end (OpenMP, CUDA, ...) actually derive an efficient low-level implementation;
  • efficient architecture-aware memory containers: what is a Kokkos::view;
  • revisit fundamental parallel patterns with Kokkos: parallel for, reduce, scan, ... ;
  • explore some mini-applications.

Several detailed examples in C/C++/Fortran will be used in hands-on session on the high-end hardware platform Jean Zay (http://www.idris.fr/jean-zay/), equipped with Nvidia Tesla V100 GPUs.

Prerequisites:
Some basic knowledge of the CUDA programming model and of C++.

The agenda of this meeting is empty