With the advance of new technologies, data volumes and number of files are constantly increasing. Additionally, new regulations (e.g. GDPR) sets strict requirements on the storage and use of privacy sensitive data. Data management has therefore become an essential part of data-driven research.
In this course we will introduce how to manage data efficiently with the data management framework iRODS (integrated Rule-Oriented Data System) and to build computational pipelines on an HPC infrastructure employing this data. Topics in this course will include:
- Data Life Cycle and FAIR principles: how to make data Findable, Accessible, Interoperable and Reusable
- iRODS: basic concepts and graphical user interface
- how to label and search for data in iRODS
- how to build a computational pipeline that operates on data managed in iRODS.