One of the greatest challenges to running parallel applications on
large numbers of processors is how to handle file IO. Standard Unix IO
routines are not designed with parallelism in mind, and IO overheads
can grow to dominate the overall runtime. Parallel file systems are
optimised for large volumes of data, but performance can be far from
optimal if every process opens its own file or if all IO is funnelled
through a single controller process.
This hands-on course explores a range of issues related to parallel
IO. It uses ARCHER2 and its parallel Lustre file system as a platform
for the exercises; however, almost all the IO concepts and performance
considerations are applicable to any parallel system.
We will give a general overview of how parallel IO is implemented in
MPI-IO as these are the routines ultimately used by higher-level
libraries such as HDF5 and NetCDF. A good understanding of the
performance characteristics of MPI-IO is therefore very useful in
optimising the IO performance of most parallel applications.
Prerequisites: The course assumes a good understanding of basic MPI
programming in C, C++ or Fortran. Knowledge of MPI derived datatypes
would be useful but not essential.