Jun 27 – 29, 2022
Europe/Berlin timezone


Please register with your official e-mail address to prove your affiliation.


In the ever-growing complexity of computer architectures, code optimisation has become the main route to keep pace with hardware advancements and effectively make use of current and upcoming High Performance Computing systems.

Have you ever asked yourself:

  • Where are the performance bottlenecks of my application?
  • What is the maximum speed-up achievable on the architecture I am using?
  • Does my code scale well across multiple machines?
  • Does my implementation match my HPC objectives?

In this workshop, we will discuss these questions and provide a unique opportunity to learn techniques, methods and solutions on how to improve code, how to enable the new hardware features and how to use visualise the potential benefits of an optimisation process.

We will describe the latest micro-processor architectures and how developers can efficiently use modern HPC hardware, including SIMD vector units and the memory hierarchy. We will also touch upon exploiting intra-node and inter-node parallelism.

Attendees will be guided along the optimisation process through the incremental improvement of an example application. Through hands-on exercises they will learn how to enable vectorisation using simple pragmas and more effective techniques like changing data layout and alignment.

The work is guided by hints from compiler reports, and profiling tools such as Intel® Advisor, Intel® VTune™ Amplifier, Intel® Application Performance Snapshot and LIKWID for investigating and improving the performance of an HPC application.

You can ask the lecturers in the Q&A session about how to optimise your code. Please provide a description of your code in the registration form.

Learning Goals

Through a sequence of simple, guided examples of code modernisation, the attendees will develop awareness on features of multi and many-core architecture which are crucial for writing modern, portable and efficient applications.

A special focus will be dedicated to scalar and vector optimisations for the Intel® Xeon® Scalable processor, code-named Skylake, utilised in the SuperMUC-NG machine at LRZ.

The workshop interleaves lecture and practical sessions.

Preliminary Agenda



1st day morning

Intro (Volker Weinberg)
Intro to LRZ HPC Systems and Software Stack (Gerald Mathias, Nisarg Patel)
Principles of optimization (Jonathan Coles)

1st day afternoon

HPC Architecture, Vectorization
Example code
Data structures
Jonathan Coles)

2nd day morning

Profiling: Code instrumentation, Roofline Model, Intel Advisor (Jonathan Coles)

2nd day afternoon

Debuggers (Gerald Mathias)
Additional Tools: valgrind and Cache simulators. (Josef Weidendorfer)
I/O Considerations (Patrick Böhl)

3rd day morning

LikWid (Carla Guillen/Thomas Gruber)
HPC report (Carla Guillen)

3rd day afternoon

Optimisation highlights by LRZ (CXS Group LRZ)


The workshop is a PRACE training event organised by LRZ in cooperation with NHR@FAU .

About the Lecturers

Patrick Böhl works in the HPC group at LRZ. He obtained his PhD in theoretical physics at LMU Munich where he studied nonlinear partial differential equations numerically and analytically. In parallel he was involved in optimizing an advanced Particle-In-Cell Code which was used for conducting huge three-dimensional plasma simulations on SuperMUC(-NG). Patrick joined LRZ in Feb 2020 with his main focus working on I/O related problems.

Jonathan Coles works in the AstroLab of the Computational X Support (CXS) group at LRZ. He holds a B.Sc. and M.Sc. in Computer Science from RIT in the United States and a PhD in Computational Astrophysics from the University of Zurich. After completing a Postdoc position in Zurich, Jonathan moved to Paris to work at the University of Versailles in close collaboration with the CEA and Intel Corp. There he developed a highly parallel and distributed implementation of the Fast Multipole Method using OpenMP and MPI for molecular dynamics applications. In 2016, he moved to Munich to work at the TUM in the Biophysics department where he continued to develop the FMM code on SuperMUC. Since September 2021, Jonathan is employed at LRZ.

Thomas Gruber (né Röhl) works in the HPC group of NHR@FAU. He leads the development of the performance tool suite LIKWID, which comprises easy-to-use tools for hardware performance monitoring, affinity control and micro-benchmarking. He also works on projects involving monitoring and analysis of hardware performance data.

Carla Guillen works as a researcher in the application support group at the LRZ. She obtained her PhD in computer science at the Technische Universitaet Muenchen in 2015. She joined the LRZ in 2009, and has been working in the fields of system-wide performance monitoring and energy optimisation of large scale clusters.

Gerald Mathias leads the Computational X Support (CXS) group at LRZ. After his PhD in Theoretical Biopyhsics he joined the chair of Theoretical Chemistry at the RUB in Bochum as a postdoc, followed by a habilition at LMU Munich. He is experienced in the development and optimisation of highly parallel ab initio and force field based molecular dynamics codes, both in Fortran and C.

Josef Weidendorfer leads the Future Computing Group at LRZ, which is developing smooth migration strategies for future HPC systems and evaluating novel technologies. This includes improvement of system level and workload analysis tools as well as novel parallel programming models. Previous research involved best use of accelerators, heterogeneous computing, and tuning strategies for parallel code including dynamic code generation techniques. Josef completed his habilitation at TUM in 2016 on simulation-driven performance analysis for parallel code, especially looking at capturing bottlenecks in the memory hierarchy of modern architectures and presenting them in a way to hint at adequate performance optimizations. He received his Ph.D. from TUM in 2003 for studying load balancing issues in car crash simulation on industrial code at BMW AG.


Attendees should be comfortable with either C/C++ or Fortran programming language and basic Linux command, like make and ssh. No previous experience in vectorisation, parallelisation and profiling tools is required.





Dr. Patrick Böhl, Dr. Jonathan Coles, Dr. Gerald Mathias , Dr. Carla Guillen, Nisarg Patel, Dr. Josef Weidendorfer (LRZ)

Thomas Gruber (NHR@FAU)

Prices and Eligibility

The course is open and free of charge for people from academia and industry from the Member States (MS) of the European Union (EU) and Associated Countries to the Horizon 2020 programme.

Withdrawal Policy

See Withdrawal

There is an open survey.