Whether you're undertaking very data-intensive HPC computations or working with large files that need to be transferred on or off a remote supercomputer, a good understanding of data management best practices will help you to make the most of available resources.
This course covers best practice for data management using the UK's National Supercomputer service, ARCHER, and its associated RDF data archive and Data Analytics Cluster as specific examples.
Day 1 will cover best practices for working with your files on ARCHER. It will describe ARCHER's local file systems and the relationship between ARCHER and the Research Data Facility, the associated system for long-term archiving. It will introduce tools such as GridFTP for moving large amounts of data on and off ARCHER and the RDF. We will also cover how to do data analysis and visualisation on the Data Analytics Cluster attached to the RDF.
Day 2 will introduce file formats that have associated libraries for high-performance IO (such as HDF5 and NetCDF). We then cover ways to achieve good IO performance on parallel systems using these formats and the lower-level MPI-IO library. We conclude with general advice on how to develop practical data management plans for your own research that are useful and address the requirements of research funding bodies.
This course is free to all academics.
Attendees are expected to have experience of using desktop computers, but no programming, Linux or HPC experience is necessary.