Data Storage and Management @ EPCC

3305 (EPCC)



The University of Edinburgh James Clerk Maxwell Building Mayfield Road Edinburgh EH9 3JZ

Whether you're undertaking very data-intensive HPC computations or working with large files that need to be transferred on or off a remote supercomputer, a good understanding of data management best practices will help you to make the most of available resources.


This course covers best practice for data management using the UK's National Supercomputer service, ARCHER, and its associated RDF data archive and Data Analytics Cluster as specific examples.

Day 1 will cover best practices for working with your files on ARCHER. It will describe ARCHER's local file systems and the relationship between ARCHER and the Research Data Facility, the associated system for long-term archiving. It will introduce tools such as GridFTP for moving large amounts of data on and off ARCHER and the RDF. We will also cover how to do data analysis and visualisation on the Data Analytics Cluster attached to the RDF.

Day 2 will introduce file formats that have associated libraries for high-performance IO (such as HDF5 and NetCDF). We then cover ways to achieve good IO performance on parallel systems using these formats and the lower-level MPI-IO library. We conclude with general advice on how to develop practical data management plans for your own research that are useful and address the requirements of research funding bodies.

This course is free to all academics.


Attendees are expected to have experience of using desktop computers, but no programming, Linux or HPC experience is necessary.

    • Introduction to the RDF
    • Practical: log on and run simple programs
    • Coffee
    • Data Movement
    • Practical: transfer data between systems
    • Lunch
    • Lecture: DAC tools
    • Practical: use the DAC for post-processing and visualisation
    • Tea
    • Lecture: Data transfer
    • Practical: transfer data to/from external systems
    • File formats
    • Practical: HDF5 exercise
    • Coffee
    • Lecture: Parallel file systems
    • Practical: Testing filesystem performance
    • Lunch
    • Lecture: Parallel IO
    • Lecture: Practical data management plans for HPC
    • Tea
    • Consultancy session