May 20 – 22, 2019
Europe/Berlin timezone

LRZ      intel    PRACE


In the ever-growing complexity of computer architectures, code optimization has become the main route to keep pace with hardware advancements and effectively make use of current and upcoming High Performance Computing systems.

Have you ever asked yourself:

  • Where does the performance of my application lay?
  • What is the maximum speed-up achievable on the architecture I am using?
  • Is my implementation matching the HPC objectives?

In this workshop, we will answer these questions and provide a unique opportunity to learn techniques, methods and solutions on how to improve code, how to enable the new hardware features and how to use the roofline model to visualize the potential benefits of an optimization process.

We will begin with a description of the latest micro-processor architectures and how the developers can efficiently use modern HPC hardware, in particular the vector units via SIMD programming and AVX-512 optimization and the memory hierarchy.

The attendees are then conducted along the optimization process by means of hands-on exercises and learn how to enable vectorization using simple pragmas and more effective techniques, like changing data layout and alignment.

The work is guided by the hints from the Intel® compiler reports (first day), and using Intel® Advisor (second day). In the second day, besides Intel Advisor, the participants will be guided to the use of Intel® VTune™ Amplifier and of Intel Application Performance Snapshot as tools for investigating and improving the performance of a HPC application. This year the workshop will consist of three days: we will dedicate most of the third day to the Intel Math Kernel Library (MKL), in order to show how to gain performance through the use of libraries.

We provide also an N-body code, to support the described optimization solutions with practical hands-on.

The course is a PRACE training event.

Learning Goals

Through a sequence of simple, guided examples of code modernization, the attendees will develop awareness on features of multi and many-core architecture which are crucial for writing modern, portable and efficient applications.

A special focus will be dedicated to scalar and vector optimizations for the latest Intel® Xeon® Scalable processor, code-named Skylake, utilized in the SuperMUC-NG machine at LRZ. The tutorial will have presentations and demo session.

The workshop interleaves frontal and practical sessions. Here is a preliminary outline:

Day 1: main ideas and compiler reports

09:00-09:45       Introduction

09:45-10:15       Introduction to LRZ systems and software

10:15-10:30       Login to hands-on cloud machines

10:30-11:00       Coffee Break

11:00-12:00       Code modernization approach

12:00-12:30       Scalar optimization

12:30-13:30       Lunch

13:30-14:30       Compiler autovectorization

14:30-15:00       Data layout from AoS to SoA

15:00-15:30       Coffee Break

15:30-16:00       Memory access optimization

16:00-16:45       SDLT (Intel SIMD Layout Templates) / Explicit vectorization / Skylake optimization

16:45-17:00       Wrap-up

Day 2: performance tools

09:00-09:30       Introduction to roofline model

09:30-10:30       Intel Advisor analysis

10:30-11:00       Coffee Break

11:00-12:30       Intel Advisor hands-on

12:30-13:30       Lunch

13:30-14:15       Introduction on VTune

14:15-15:00       Demo on VTune

15:00-15:30       Coffee Break

15:30-16:00       Introduction on APS

16:00-16:30       Demo / hands-on on APS

16:30-17:00       Wrap-up

Day 3: performance libraries (preliminary)

Intel MKL:

      Introduction and General Tips


      Sparse BLAS




      Sparse Solver

Please bring your own laptop (with X11 support and an ssh client installed) for the hands-on sessions! For GUI applications we require the installation of vncviewer ( )”.


About the Lecturers

Fabio Baruffa is a software technical consulting engineer in the Developer Products Division (DPD) of the Software and Services Group (SSG) at Intel. He is working in the compiler team and provides customer support in the high performance computing (HPC) area. Prior at Intel, he has been working as HPC application specialist and developer in the largest supercomputing centers in Europe, mainly the Leibniz Supercomputing Center and the Max-Plank Computing and Data Facility in Munich, as well as Cineca in Italy. He has been involved in software development, analysis of scientific code and optimization for HPC systems. He holds a PhD in Physics from University of Regensburg for his research in the area of spintronics device and quantum computing.

Gennady Fedorov is a Technical Consulting Engineer supporting technical and Intel Performance Libraries ( IPP, MKL and DAAL) within the Intel Architecture, Graphics and Software Group at Intel in Russia. His focus areas are Image Processing, Crypto, Compressing techniques, High Performance Computing and Artificial Intelligence.

Luigi Iapichino holds a position of scientific computing expert at LRZ and is a former member of the Intel Parallel Computing Center. His main tasks are code modernization for many-core and multi-core systems, and HPC high-level support in the PRACE framework. He got in 2005 a PhD in physics from the Technical University of Munich, working at the Max Planck Institute for Astrophysics. Before moving to LRZ in 2014, he worked at the Universities of Würzburg and Heidelberg, involved in research projects related to computational astrophysics. He is the team lead of the LRZ Application Lab for Astro and Plasma Physics (AstroLab).

Gerald Mathias works in the application support for the HPC systems at LRZ since 2015 and leads the Biolab@LRZ. After his PhD in Computational Biopyhsics at the LMU Munich he joined the chair of Theoretical Chemistry at the RUB in Bochum afterwards as a postdoc. He is experienced in the development and optimization of highly parallel ab initio and force field based molecular dynamics codes, both in Fortran and C.

Michael Steyer is a Technical Consulting Engineer supporting technical and High Performance Computing segments within the Intel Architecture, Graphics and Software Group at Intel in Germany. His focus areas are High Performance Computing and Artificial Intelligence.

Kursraum 1 (H.U.002)
Boltzmannstr. 1 85748 Garching b. München Germany


Attendees should be comfortable with either C/C++ or Fortran programming language and basic Linux command, like make and ssh. No previous experience in vectorization and parallelization is required and profiling tools, as well.
Language: English
Further information: Travel info, hotel info
Registration: Via


Fabio Baruffa (Intel), Gennady Fedorov (Intel), Mathias Gerald (LRZ), Luigi Iapichino (LRZ), Michael Steyer (Intel).
Fee: This course is a PRACE Advanced Training Center event. Therefore, the course is free of charge for all participants from the EU or from PRACE-member-countries.