Please note the course will be held in EPCC Edinburgh:
ARCHER, the UK's national supercomputing service, offers training in software development and high-performance computing to scientists and researchers across the UK. As part of our training service we are running a 2 day Data Management: IO, Transfer and Storage course at EPCC.
Whether you're undertaking very data-intensive computations or working with large input or output files that need to be transferred on or off an HPC machine, a good understanding of data management best practices will help you to make the most of available resources.
This two day course will fall into two parts:
The first part will cover best practices for working with your files on ARCHER. It will describe ARCHER's file systems and the relationship between ARCHER and the RDF. It will introduce GridFTP as a mechanism for moving large amounts of data on and off ARCHER and the RDF. We'll then introduce file formats that have associated libraries for high-performance IO (such as HDF5 and NetCDF).
The second part will cover alternative data storage and transmission techniques. It will introduce both traditional "SQL" databases and more modern "NoSQL" databases that are particularly suited for some big data applications. Finally, we'll cover some of the widely used file formats used when working with data on the Internet, such as XML and JSON.