Efficient Parallel I/O @ EPCC at Imperial College London

Sherfield Building Level 5 (SALC 10)

Sherfield Building Level 5


Imperial College London South Kensington Campus

Please note:  This course will take place at Imperial College London

SALC 10,  Sherfield building, level 5, South Kensington Campus

One of the greatest challenges to running parallel applications on large numbers of processors is how to handle file IO. Standard IO routines are not designed with parallelism in mind, and IO overheads can grow to dominate the overall runtime. Parallel file systems are optimised for large data transfers, but performance can be far from optimal if every process opens its own file or if all IO is funneled through a single master process.

This hands-on course explores a range of issues related to parallel IO. It uses ARCHER and its parallel Lustre file system as a platform for the exercises; however, almost all the IO concepts and performance considerations are applicable to any parallel system.

The IO part of the MPI standard gives programmers access to efficient parallel IO in a portable fashion. However, there are a large number of different routines available and some can be difficult to use in practice. Despite its apparent complexity, MPI-IO adopts a very straightforward high-level model. If used correctly, almost all the complexities of aggregating data from multiple processes can be dealt with automatically by the library.

The first day of the course will cover the MPI-IO standard, developing IO routines for a regular domain decomposition example. It will also briefly cover higher-level standards such as HDF5 and NetCDF which are built on top of MPI-IO.

The second day will concentrate on performance, covering how to configure the parallel file system and tune the MPI-IO library for best performance. Case studies from real codes will be presented.

Prerequisites: The course assumes a good understanding of basic MPI programming in Fortran, C or C++. Knowledge of MPI derived datatypes would be useful but not essential.


Day 1

09:30 - 10:15 : Parallel IO
10:15 - 11:00 : Practical : Basic IO
11:00 - 11:30 : Break
11:30 - 12:15 : Derived Datatypes for MPI-IO
12:15 - 13:00 : Practical: Derived Datatypes
13:00 - 14:00 : Lunch
14:00 - 14:45 : Basic MPI-IO Routines
14:45 - 15:30 : Practical: Basic MPI-IO
15:30 - 16:00 : Break
16:00 - 16:45 : MPI-IO Features and alternative libraries
16:45 - 17:30 : Practical : Alternative Libraries

Day 2

09:30 - 10:15 : Lustre file system on ARCHER
10:15 - 11:00 : Practical: Lustre configuration
11:00 - 11:30 : Break
11:30 - 12:15 : Parallel IO libraries on ARCHER
12:15 - 13:00 : Practical: tuning parallel IO
13:00 - 14:00 : Lunch
14:00 - 14:45 : Case studies
14:45 - 15:30 : Individual consultancy session



David Henty

David teaches on a wide range of EPCC's technical training courses, including MPI and OpenMP, and is overall course organiser for EPCC's MSc in High Performance Computing.



Course Material


The agenda of this meeting is empty