This module focuses on two state-of-the-art programming models for HPC applications: OpenMP and MPI. For both programming models, basic as well as certain selected advanced topics (e.g. for MPI new features such as non-blocking or sparse neighbourhood collectives) are presented. For the MPI session, special attention is also put on best practices for achieving good program performance, based on the presenter’s experience from the support of recent PRACE Preparatory Access Type C projects.
Intel Xeon Phi Programming
In this module, Intel’s Many Integrated Core (MIC) architecture is introduced. The session covers various programming models for Intel Xeon Phi coprocessors (like native mode vs. offload mode, OpenMP and MPI parallelisation etc.) as well as some selected optimisation techniques. Hands-on sessions are planned to take place on an Intel Xeon Phi based system at VSB.
This module covers parallel IO concepts related with parallel file systems, IO techniques and performance analysis. Furthermore, it introduces the IO libraries MPI-IO, SIONlib and high level libraries HDF5 and NetCDF. The theoretical part will be complemented by practical exercises for each presented library.
The Portable Extensible Toolkit for Scientific computing (PETSc) is a modular library for linear algebra, non-linear solvers, time integrators, optimization, and spatial discretization. Solver configuration and diagnostics are valuable skills for users, whether calling PETSc directly or via one of many higher level packages that access PETSc solvers. The tutorial will start with the fundamental linear algebra components then proceed to principles of preconditioning and Krylov solvers, convergence diagnostics, performance analysis, and the higher level solver interfaces. It will contain hands-on exercises to build the skills necessary to evaluate methods and design solvers for complex problems in science and engineering.
Tools for Performance Analysis
This module gives an introduction to effective strategies for analysing performance and IO behaviour of HPC applications. The focus will lie on HPCToolkit for performance analysis as well as on Darshan, Vampir and TAU for IO Profiling and IO Tracing. Hands-on sessions shall lower the threshold for attendees to actually using these tools in the course of their every-day work.
Advanced Parallel Programming
A first session deals with exploiting parallelism on Multi-Core CPUs considering memory hierarchies: The different levels of parallelism implemented in hardware are presented. For each level the implementation in hardware is illustrated. We analyse the relevance of each level from a programmer’s point of view and present memory hierarchies in more detail. We show the motivation for caching in hardware and what kind of problems arise from caching in a parallel context. Finally the roofline model is presented, which allows to estimate the performance of parallel algorithms.
The second session presents an introduction to vectorization on Intel x86 CPUs from a programmer’s point of view. The different levels of the programmer's control over the vectorization are shown. Special focus lies in the auto vectorization support of the Intel C/C++ compiler.