Numerical simulations conducted on current high-performance computing (HPC) systems face an ever growing need for scalability. Larger HPC platforms provide opportunities to push the limitations on size and properties of what can be accurately simulated. Therefore, it is needed to process larger data sets, be it reading input data or writing results. Serial approaches on handling I/O in a parallel application will dominate the performance on massively parallel systems, leaving a lot of computing resources idle during those serial application phases.
In addition to the need for parallel I/O, input and output data is often processed on different platforms. Heterogeneity of platforms can impose a high level of maintenance, when different data representations are needed. Portable, selfdescribing data formats such as HDF5 and netCDF are examples of already widely used data formats within certain communities.
This course will start with an introduction to the basics of I/O, including basic I/O-relevant terms, an overview over parallel file systems with a focus on GPFS, and the HPC hardware available at JSC. Different I/O strategies will be presented. The course will introduce the use of the HDF5, the NetCDF and the SIONlib library interfaces as well as MPI-I/O. Optimization potential and best practices are discussed.
Instructors: Sebastian Lührs, Benedikt Steinbusch, Jülich Supercomputing Centre