The increase in computational power goes hand in hand with an increase in the size of the data to be managed, both on the input and on the output sides. IO can easily become a bottleneck for large scale architectures. Understanding of parallel file system mechanisms and parallel IO concepts enables users to efficiently use existing high level libraries like HDF5.
With the increasing performance gap between compute and storage, even the best use of IO bandwidth might not be enough and data reduction based on in-situ or in-transit post-processing becomes a requirement for large-scale codes. More and more libraries able to handle this become available. This course proposes an introduction to FlowVR, a technology developed at Inria dedicated to in-situ and in-transit post-processing.
To handle the multiplication of libraries, this course will also introduce the Parallel Data Interface (PDI), a solution to couple simulation codes with libraries for I/O and post-processing based on simple annotations. PDI improves code quality with a) annotations independent of the library used, b) a choice of IO strategy at runtime in YAML and c) negligible overhead to access the full power of the underlying libraries.
- Day 1: HDF5, pHDF5 and general parallel IO concepts (2x3h, including hands-on)
- Day 2: Lustre file system (1h30)
- Day 2: FlowVR (1h30 + 3h including hands-on)
- Day 3: Parallel Data Interface (3h including hands-on)
- Day 3: Diving deeper into HDF5, FlowVR or PDI, your choice ! (3h)
Instructors: M. Haefele (Maison de la Simulation, CNRS), Thomas Leibovici (TGCC, CEA), Bruno Raffin (INRIA), Julien Bigot (Maison de la Simulation, CEA)
Learning outcomes: After this course, participants should understand the trade-offs implied by using a parallel file-system, and know how to efficiently use parallel IO libraries. Participants will also have a basic understanding and practise of FlowVR and PDI.
Prerequisites: Knowledge of C or Fortran programming languages, parallel programming with MPI