The training will be fully remote. In order to participate, as a pre-requisite, you need a personal computer connected to the internet with a working SSH client.
The increase in computational power goes hand in hand with an increase in the size of the data to be managed, both on the input and on the output sides. IO can easily become a bottleneck for large scale architectures. Understanding of parallel file system mechanisms and parallel IO concepts enables users to efficiently use existing high level libraries like HDF5, NetCDF or SIONlib.
With the increasing performance gap between compute and storage, even the best use of IO bandwidth might not be enough. This is especially critical for checkpointing in the context of fault tolerance. This course proposes an introduction to FTI, a library that aims to give computational scientists the means to perform fast and efficient multilevel checkpointing in large scale supercomputers.
To handle the multiplication of libraries, this course will also introduce the PDI Data Interface, a solution to couple simulation codes with libraries for I/O and post-processing (including HDF5, NetCDF, SIONlib, FTI, ...) based on simple annotations. PDI improves code quality with a) annotations independent of the library used, b) a choice of IO strategy at runtime in YAML and c) negligible overhead to access the full power of the underlying libraries.
- 9h30 - 11h: Storage@TGCC & Lustre filesystems (Thomas Leibovici - TGCC, CEA)
- 11h - 12h30: parallel I/O strategies and optimization with a focus on SIONlib (Sebastian Lührs - JSC, FZJ)
- 14h - 14h30: parallel I/O strategies and optimization with a focus on SIONlib Contn'd (Sebastian Lührs - JSC, FZJ)
- 14h30 - 17h: Sequential HDF5 (Matthieu Haefele - LMAP, CNRS)
- 9h30 - 12h30: Parallel HDF5 (Matthieu Haefele - LMAP, CNRS)
- 14h - 17h: NetCDF (Olga Abramkina - MdlS/IDRIS, CNRS)
- 9h30 - 12h30: the FTI fault-tolerance library (Leonardo Bautista Gomez - BSC)
- 14h - 17h: the PDI Data Interface (Julien Bigot - MdlS, CEA)
Instructors: Olga Abramkina (MdlS/IDRIS, CNRS), Leonardo Bautista Gomez (BSC), Julien Bigot (MdlS, CEA), Matthieu Haefele (LMAP, CNRS), Thomas Leibovici (TGCC, CEA), Sebastian Lührs (JSC, FZJ)
Learning outcomes: After this course, participants should understand the trade-offs implied by using a parallel file-system, and know how to efficiently use parallel IO libraries. Participants will also have a basic understanding and practise of FTI and PDI.
Prerequisites: Knowledge of C or Fortran programming languages, parallel programming with MPI