Mar 20 – 21, 2018
University of Cambridge
Europe/London timezone

Please note:  This course takes place in Cambridge.

One of the greatest challenges to running parallel applications on large numbers of processors is how to handle file IO. Standard IO routines are not designed with parallelism in mind, and IO overheads can grow to dominate the overall runtime. Parallel file systems are optimised for large data transfers, but performance can be far from optimal if every process opens its own file or if all IO is funneled through a single master process.

This hands-on course explores a range of issues related to parallel IO. It uses ARCHER and its parallel Lustre file system as a platform for the exercises; however, almost all the IO concepts and performance considerations are applicable to any parallel system.

The IO part of the MPI standard gives programmers access to efficient parallel IO in a portable fashion. However, there are a large number of different routines available and some can be difficult to use in practice. Despite its apparent complexity, MPI-IO adopts a very straightforward high-level model. If used correctly, almost all the complexities of aggregating data from multiple processes can be dealt with automatically by the library.

The first day of the course will cover the MPI-IO standard, developing IO routines for a regular domain decomposition example. It will also briefly cover higher-level standards such as HDF5 and NetCDF which are built on top of MPI-IO.

The second day will concentrate on performance, covering how to configure the parallel file system and tune the MPI-IO library for best performance. Case studies from real codes will be presented.

Prerequisites: The course assumes a good understanding of basic MPI programming in Fortran, C or C++. Knowledge of MPI derived datatypes would be useful but not essential.

Timetable

Day 1

09:30 - 10:15 : Parallel IO
10:15 - 11:00 : Practical : Basic IO
11:00 - 11:30 : Break
11:30 - 12:15 : Derived Datatypes for MPI-IO
12:15 - 13:00 : Practical: Derived Datatypes
13:00 - 14:00 : Lunch
14:00 - 14:45 : Basic MPI-IO Routines
14:45 - 15:30 : Practical: Basic MPI-IO
15:30 - 16:00 : Break
16:00 - 16:45 : MPI-IO Features and alternative libraries
16:45 - 17:30 : Practical : Alternative Libraries

Day 2

09:30 - 10:15 : Lustre file system on ARCHER
10:15 - 11:00 : Practical: Lustre configuration
11:00 - 11:30 : Break
11:30 - 12:15 : Parallel IO libraries on ARCHER
12:15 - 13:00 : Practical: tuning parallel IO
13:00 - 14:00 : Lunch
14:00 - 14:45 : Case studies
14:45 - 15:30 : Individual consultancy session

 

Course Materials : http://www.archer.ac.uk/training/course-material/2018/03/parallel-io-camb/index.php

Trainer

David Henty

David teaches on a wide range of EPCC's technical training courses, including MPI and OpenMP, and is overall course organiser for EPCC's MSc in High Performance Computing.

Starts
Ends
Europe/London
University of Cambridge
This course is part-funded by the PRACE project and is free to all. Please register using the online form. If you have any questions, please consult the course forum page or contact epcc-support@epcc.ed.ac.uk.