With the petering-out of Moore's law and the end of Dennard's scaling, the pace dictated on the performance increase of High Performance Computing Systems among generations has led to power constrained architectures and systems. In addition power consumption represents a significant cost factor in the overall HPC system economy. For those reasons in recent years, researchers, supercomputing centres and major vendors have developed new tools and methodologies to measure and optimise the energy consumption of large scale high performance system installation. Due to the link between energy consumption, power consumption, and execution time of the application executed by the final user, it is important for tools and methodology to consider all of these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
The school will give an introductory course on the fundamental concept of power consumption and energy efficiency in HPC systems. Then it will focus on the mechanisms that today's computing elements and systems provide in terms of monitoring and control of power and energy dissipation. As well as insights on the European Processor Initiative power management design. Finally it will introduce and give a hands-on for a set of tools for reducing the energy consumption in HPC devices.
The school is organised into four main sessions, driving the audience from the physical and engineering principles underlying power consumption in supercomputing systems to the practical usage of state-of-the-art tools for monitoring and controlling the energy efficiency of supercomputing machines and workloads. The tools that will be covered are the MSR-SAFE (LLNL), MERIC (IT4I), COUNTDOWN (UNIBO) and Io2s (TUD).
Purpose of the course (benefits for the attendees)
By the end of the course, participants will be expected to:
- have a good understanding of the principles underlying power consumption and energy dissipation in high performance computing nodes
- recognise trade-offs and the implications of changing the power consumption in scientific computing systems during the execution of scientific computing applications
- have a clear idea of the state-of-the-art and of practices in controlling the power consumption and energy efficiency of supercomputing nodes and processors
- learn the internals and the usage of a set of user-space run-time libraries for controlling/optimising the power consumption and energy efficiency in x86 computing nodes while executing user's applications
- learn how to use these tools to optimise the energy consumption of your codes.
About the tutors
Lubomir Riha is the Head of the Infrastructure Research Lab at IT4Innovations National Supercomputing Center. Previously he was a senior researcher in the Parallel Algorithms Research Lab at IT4Innovations, and a research scientist in the High Performance Computing Lab at George Washington University, ECE Department. He received his PhD and MSc degrees in Electrical Engineering from the Czech Technical University in Prague, the Czech Republic, in 2011, and his Ph.D. degree in Computer Science from Bowie State University, USA. Currently he is a local principal investigator of the H2020 Center of Excellence project POP2. Previously he was an investigator in the FP7 EXA2CT project and the Intel Parallel Computing Center, as well as a local principal investigator of the H2020-FET HPC READEX project. He is also co-principal developer of the ESPRESO finite element library, which includes a parallel sparse solver designed for supercomputers with tens or hundreds of thousands of cores, with support for both GPU and Intel Xeon Phi accelerators. His research interests are optimisation of HPC applications, energy efficient computing, acceleration of scientific and engineering applications using GPU and many-core accelerators, development of scalable linear solvers, parallel rendering on new HPC architectures, and signal and image processing.
Ondrej Vysocky received his M.Sc. degree in Computer Science from Brno University of Technology, Czech Republic in 2016. His masters thesis focused on parallel I/O optimisation. Currently he is a PhD student at VSB – Technical University of Ostrava, Czech Republic, and he simultaneously works at IT4Innovations in the Infrastructure Research Lab. His research is focused on energy-efficiency in high performance computing. He was also an investigator of the Horizon 2020 READEX project, which deals with energy efficiency of High Performance Computing applications using dynamic tuning. He has since been developing a MERIC library, a tool for energy measurement and hardware parameter tuning during parallel application runs.
Andrea Bartolini received a Ph.D. degree in Electrical Engineering from the University of Bologna, Italy, in 2011. He is currently an assistant professor in the Department of Electrical, Electronic and Information Engineering (DEI) at the University of Bologna. Previously, he was post-doctoral researcher in the Integrated Systems Laboratory at ETH Zurich. Since 2007 Dr Bartolini has published more than 100 papers in peer-reviewed international journals and conferences with a focus on dynamic resource management for embedded and HPC systems. Since one year Dr Bartolini leads the power management co-design of the European Processor Initiative design.
Daniele Cesarini graduated in Computer Engineering from the University of Bologna in 2014, where he also earned a PhD degree in Electrical Engineering from the Department of Electrical, Electronic and Information Engineering in 2019. He is currently an HPC software engineer at CINECA, the Italian National Supercomputing Center, where he works in the area of performance optimisation on large-scale scientific applications for the new generation of heterogeneous HPC architectures. His research interests also concern the development of SW-HW co-design strategies as well as algorithms for parallel programming support for energy-efficient HPC systems.
Robert Schöne works as a post-doc at Technische Universität Dresden, where he also received his PhD. His research includes micro-architectural features of processors, as well as tools and methods for measuring and tuning performance and energy-efficiency of parallel applications. After he received his diploma, he worked in different projects that targeted the measurement and tuning of energy-efficiency of computer systems. Among other things, he described and implemented interfaces that extend performance measurement frameworks for such cases. He was also part of the team that developed the Bull specific power and energy measurement framework HDEEM. After his PhD, he was the scientific manager of the Horizon2020 project READEX, which implemented an automated tool suite for energy efficiency optimisation. Currently, he teaches at the Faculty of Computer Science. Since he received his diploma, he has published more than 30 papers, and organised and co-organised four workshops with a focus on auto-tuning and energy-efficiency.