Efficient Parallel IO on ARCHER @ EPCC

Daresbury, UK

Daresbury, UK

STFC Daresbury Laboratory, Sci-Tech Daresbury, Daresbury, Warrington WA4 4AD

Please note this course will be held in Daresbury:
STFC Daresbury Laboratory
Sci-Tech Daresbury

One of the greatest challenges to running parallel applications onlarge numbers of processors is how to handle file IO: standard IO routinesare not designed with parallelism in mind. Parallel file systems such as Lustre are optimised for large data transfers, and performance can be far from optimal if many files are opened at once.

The IO part of the MPI standard gives programmers access to efficient parallel IO in a portable fashion. However, there are a large number of different routines available and some can be difficult to use in practice. Despite its apparent complexity, MPI-IO adopts a very straightforward high-level model. If used correctly, almost all the complexities of aggregating data from multiple processes can be dealt with automatically by the library.

The first day of the course will cover the MPI-IO standard, developing IO routines for a regular domain decomposition example. It will also briefly cover higher-level standards such as HDF5 and NetCDF.

The second day will concentrate on ARCHER, covering how to configure the Lustre file system for best performance and how to tune the Cray MPI-IO library. Case studies from real codes will also be presented.

Prerequisites: The course assumes a good understanding of basic MPI programming in Fortran, C or C++. Knowledge of MPI derived datatypes would be useful but not essential.