14-15 July 2014
Stuttgart (Germany)
CET timezone
This course teaches performance engineering approaches on the compute node level. "Performance engineering" as we define it is more than employing tools to identify hotspots and bottlenecks. It is about developing a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. Once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of optimizations can often be predicted. We introduce a "holistic" node-level performance engineering strategy, apply it to different algorithms from computational science, and also show how an awareness of the performance features of an application may lead to notable reductions in power consumption.
Stuttgart (Germany)
Big seminar room
HLRS, University of Stuttgart, Allmandring 30, D-70569 Stuttgart, Germany
Agenda 1st day: 8:30 - 9:00 local registration, 9:00 - 17:00 course
2nd day: 9:00 - 17:00 course
Detailed Content


  • Intel and AMD x86 architectures
  • ccNUMA
  • Performance modeling & engineering approaches
  • Our Approach

Practical performance analysis

  • The LIKWID tools
  • Typical performance patterns

Microbenchmarks and the memory hierarchy                

  • Understanding the memory hierarchy
    • Data transfer between memory levels
    • Write allocate vs. NT stores
    • Modeling of cache hierarchies
    • Contention
  • NUMA effects - anisotropy and asymmetry

Typical node-level software overheads

  • Cost of synchronization
  • Work Distribution

Example Problem: The 3D Jacobi solver

  • Core-level optimizations
    •  Blocking
    •  Non Temporal stores
    •  SIMD vectorization (SSE, AVX)
  • Multithreading - contention at different memory hierarchies
  • Temporal Blocking

Example Problem: The Lattice-Boltzmann Method (LBM)

  • Introduction
  • Roofline Model
  • Data layout
  • Non Temporal stores
  • Model  for in-cache data & multicore scaling
  • Sparse representation and options for Propagation

Example Problem: Sparse Matrix-Vector Multiplication

  • Data layouts
  • Performance model - CPU vs. GPU
  • Bandwidth reduction

Example Problem:  A backprojection algorithm for CT reconstruction

  • The algorithm
  • Naïve analysis
  • Detailed analysis and performance model  
  • Optimizations

Energy & Parallel Scalability

  • Energy consumption of modern processors
  • The energy-to-solution metric
  • Performance engineering == power engineering
  • Case studies
Language English
Teachers Dr. Georg Hager (RRZE), Dr.-Ing. Jan Treibig (RRZE) (HPC, Uni. Erlangen)
Registration For academic participants within Europe or PRACE: see Registration button on the left. After your registration, you will receive an automated "congratulations"-email about your successful registration. This email implies that you have a guaranteed seat in the course and you should organize your travel.

For other participants, please apply for the HLRS course 2014-NLP.
Deadline for registration is June 15, 2014
Fee Members of German universities and public research institutes: none
Members of universities and public research institutes within Europe or PRACE: none
All other participants (not from academia, or from outside Europe), please apply for the HLRS course 2014-NLP.
Prerequisites Participants must have basic knowledge in programming with Fortran or C
See HLRS-travel-info. The next public transport stations are: "Universität, Stuttgart" (S-Bahn station, 15 min on foot) and "Lauchhau, Stuttgart" (Bus station, 4 min on foot to HLRS, bus lines 84, 92, 746, 747, 748, but not 82! from S-Bahn station "Universität, Stuttgart" and bus line 81 from S-Bahn station "Stuttgart-Vaihingen").
Accomodation: see also additional hotel list and HLRS-travel-info. Private Bed&Breakfast is also available (might be cheaper than the hotels), see, e.g., www.night-and-day.de or www.nd-bed-breakfast.de. A youth hostel is also available.
Further links: Online-Stadtplan des Stadtmessungsamtes Stuttgart or www.city-map.de).
Local Organizer Gabi Kallenberger phone 0711 685 65828, kallenberger@hlrs.de, Rolf Rabenseifner phone 0711 685 65530, rabenseifner@hlrs.de
Cancelation policy If you cannot come to the course, please send an email to the organizer as soon as possible. This would allow us to accept additional participants from the waiting-list. There is no cancelation fee.
NO-SHOW: Registered persons that do not cancel and do not show up without any reasons are blocked for the next year on any of our workshops (because it is to expensive to produce unused copies of the slides for them).
Limit Maximum of 50 participants in Stuttgart (according to the seats in the rooms).
Handouts Each participant will get a paper copy of all slides.

If you have any questions, please consult the course forum page or click on the contact link on the left to send an email to the local organisers.
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now