PRACE Training Centre Events

[POSTPONED] Working effectively with HPC systems @ SNIC (1/1)

Europe/Stockholm
Online

Online

NSC, LiU, Sweden
Description

OBS: postponed until early Autumn!

In brief

The seminar will present useful tools and best practices for working effectively on HPC systems. It is expected to be of interest for a general HPC system user, both at a more familiar (intermediate) or starting (beginner) level. To participate, register down below.

Introduction

Working efficiently with HPC starts with the tools you use to interact with the HPC system. It is also helpful to understand the general anatomy of HPC systems and storage. Following on from these fundamentals, we will give some recommendations for data organization on the system and examples of various types of file systems (e.g. parallel vs. local) and their individual strengths and weaknesses. We will then discuss the concepts of parallelism, scalability, scheduling and what types of OS and software you can expect of HPC systems. We will go through some important things to consider when building and installing software. Finally, we will look at different ways of running software on HPC systems and ways to monitor your software as it is running, with the aim of ensuring that your jobs are not poorly configured or wasting resources.

While the content and the practices are useful for HPC systems in general, we will show examples and tools more specific for the NSC clusters, e.g. Tetralith and Sigma.

Schedule OBS: postponed until early Autumn!

The schedule for the day is divided into two main parts, before and after lunch break. The parts include several blocks of 20-40 minutes with breaks in between. Each block will include opportunities for questions.

10:00 -12:00 Part I

12:00 -13:00 L u n c h

13:00 -15:00 Part II

Topics/blocks (preliminary)

  • Welcome, introductions and practicalities
  • Tools at your end (e.g. terminal, ssh config., file transfer tools, VNC)
  • HPC system anatomy (login and compute nodes, interconnect, storage)
  • Properties and features of storage areas (e.g. quotas, performance, locality, backups, snapshots, scratch)
  • Concept of parallelism (Amdahl’s law), scalability, scheduling and practical advice for good performance
  • Software on an HPC system (OS, modules, python envs., concept of build envs., containers with Singularity)
  • Ideas and strategies for organizing your workflow (data and file management, traceability and reproducibility)
  • Interacting with the Slurm queueing system (requesting resources interactively or in batch)
  • Practical examples (preparing, submitting, monitoring and evaluating job efficiency)

Presenters

Peter Kjellström, Weine Olovsson, Torben Rasmussen, Hamish Struthers, all at NSC, LiU, Sweden.