Contents
In the ever-growing complexity of computer architectures, code optimisation has become the main route to keep pace with hardware advancements and effectively make use of current and upcoming High Performance Computing systems.
Have you ever asked yourself:
- Where does the performance of my application lay?
- What is the maximum speed-up achievable on the architecture I am using?
- Is my implementation matching the HPC objectives?
In this workshop, we will answer these questions and provide a unique opportunity to learn techniques, methods and solutions on how to improve code, how to enable the new hardware features and how to use the roofline model to visualise the potential benefits of an optimisation process.
We will begin with a description of the latest micro-processor architectures and how the developers can efficiently use modern HPC hardware, in particular the vector units via SIMD programming and AVX-512 optimisation and the memory hierarchy.
The attendees are then conducted along the optimisation process by means of hands-on exercises and learn how to enable vectorisation using simple pragmas and more effective techniques, like changing data layout and alignment.
The work is guided by the hints from the Intel® compiler reports, and using Intel® Advisor. Besides Intel® Advisor, the participants will also be guided to the use of Intel® VTune™ Amplifier, Intel® Application Performance Snapshot and LIKWID as tools for investigating and improving the performance of a HPC application.
We provide an N-body code, to support the described optimisation solutions with practical hands-on.
You can ask Intel in the Q&A session about how to optimise your code. Please provide a description of your code in the registration form.
Learning Goals
Through a sequence of simple, guided examples of code modernisation, the attendees will develop awareness on features of multi and many-core architecture which are crucial for writing modern, portable and efficient applications.
A special focus will be dedicated to scalar and vector optimisations for the Intel® Xeon® Scalable processor, code-named Skylake, utilised in the SuperMUC-NG machine at LRZ.
The workshop interleaves frontal and practical sessions.
Preliminary Agenda
Session |
|
1st day morning |
Intro (Volker Weinberg) |
1st day afternoon |
Intel Compiler & Vectorization (Igor Vorobtsov /Alina Shadrina) |
2nd day morning |
Roofline Model (Jonathan Coles) |
2nd day afternoon |
VTune (Michael Steyer) |
3rd day morning |
LikWid (Carla Guillen/Thomas Gruber) |
3rd day afternoon |
Optimisation highlights by LRZ (CXS Group LRZ) |
3rd day Q&A | All |
Recommended Access Tools
- Exercises will be done on the CooLMUC2 Cluster @ LRZ with 28-way Haswell-based nodes and FDR14 Infiniband interconnect
- Please use your own laptop or PC with X11 support and an ssh client installed for the hands-on sessions.
-
Under Windows
- We recommend to install the comfortable tool MobaXterm (https://mobaxterm.mobatek.net/download-home-edition.html) which also includes an X11 client.
- Alternatively install and run the Xming X11 Server for Windows: https://sourceforge.net/projects/xming/ and then install and run the terminal software putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
- Under macOS
- Install X11 support for macOS XQuartz: https://www.xquartz.org/
- Under Linux
- ssh and X11 support comes with all distributions
-
The workshop is a PRACE training event organised by LRZ in cooperation with Intel and NHR@FAU .
About the Lecturers
Jonathan Coles works in the AstroLab of the Computational X Support (CXS) group at LRZ. He holds a B.Sc. and M.Sc. in Computer Science from RIT in the United States and a PhD in Computational Astrophysics from the University of Zurich. After completing a Postdoc position in Zurich, Jonathan moved to Paris to work at the University of Versailles in close collaboration with the CEA and Intel Corp. There he developed a highly parallel and distributed implementation of the Fast Multipole Method using OpenMP and MPI for molecular dynamics applications. In 2016, he moved to Munich to work at the TUM in the Biophysics department where he continued to develop the FMM code on SuperMUC. Since September 2021, Jonathan is employed at LRZ.
Thomas Gruber (né Röhl) works in the HPC group of NHR@FAU. He leads the development of the performance tool suite LIKWID, which comprises easy-to-use tools for hardware performance monitoring, affinity control and micro-benchmarking. He also works on projects involving monitoring and analysis of hardware performance data.
Carla Guillen works as a researcher in the application support group at the LRZ. She obtained her PhD in computer science at the Technische Universitaet Muenchen in 2015. She joined the LRZ in 2009, and has been working in the fields of system-wide performance monitoring and energy optimisation of large scale clusters.
Gerald Mathias leads the Computational X Support (CXS) group at LRZ. After his PhD in Computational Biopyhsics at the LMU Munich he joined the chair of Theoretical Chemistry at the RUB in Bochum afterwards as a postdoc. He is experienced in the development and optimisation of highly parallel ab initio and force field based molecular dynamics codes, both in Fortran and C.
Edmund Preiss is a European Business Development Manager for Intel’s Software Developer Tools, a position he has held for 14+ years.
Edmund Preiss joined Intel in 1988 and has since managed various product marketing, technical and business development programs, projects and teams.
He holds a Diploma of Electronic Engineering and brings with more than 35 years of Industry experience. Beside Intel he worked in the semiconductor business for the following companies: Siemens Semiconductor Components Division, Thomson Semiconductor and ST Microelectronics.
Alina Shadrina is a Technical Consulting Engineer supporting Intel Compilers in Russia and EMEA. Alina got a Master of Science degree in Applied Mathematics with a specialization in Data Science. Her focus area is High-Performance Computing and enterprise applications in D++\DPC++ and Fortran.
Michael Steyer is a Technical Consulting Engineer supporting technical and High Performance Computing segments within the Intel Architecture, Graphics and Software Group at Intel in Germany. His focus areas are High Performance Computing and Artificial Intelligence.
Dmitry Tarakanov is a Software Technical Consulting Engineer with more than 20 years of experience in the areas of software development, application tuning and developer support. Dmitry got a Master of Science degree in Radio physics. His focus areas are High Performance Computing and Telecom Applications.
Igor Vorobtsov has more than 11 years of experience in the areas of C/C++ and Fortran compilers, application tuning and developer support. Igor got a Master of Science degree in Applied Mathematics. Since joining Intel in 2008, Igor has worked as a Technical Consulting Engineer supporting software developers throughout EMEA region. Igor has a broad array of application experience, including enterprise applications and high performance computing environments.
Prerequisites
Attendees should be comfortable with either C/C++ or Fortran programming language and basic Linux command, like make and ssh. No previous experience in vectorisation, parallelisation and profiling tools is required.
Language
English
Lecturers
Jonathan Coles, Mathias Gerald, Carla Guillen (LRZ)
Thomas Gruber (NHR@FAU)
Edmund Preiss, Alina Shadrina, Michael Steyer, Dmitry Tarakanov, Igor Vorobtsov (Intel)
Prices and Eligibility
The course is open and free of charge for people from academia and industry from the Member States (MS) of the European Union (EU) and Associated Countries to the Horizon 2020 programme.
Registration
Please register with your official e-mail address to prove your affiliation.
Withdrawal Policy
See Withdrawal