Whenever one has to deal with multiple jobs on a HPC system, the idea of automating parts or all of the job management process involves describing and implementing so-called 'workflows'. Options for managing workflows are numerous and range from using basic scheduler features such as job arrays and job dependencies, up to using a complex system managed by a central, multi-user, database. Moreover, workflows tools are also available from the software development and deployment ecosystem and the whole "devops" movement. This online workshop aims at guiding participants towards the right tool for their use and help them reduce the time they spend managing their jobs by automating what can be automated and follow best practices.
After attending the workshop, you will
This presentation will present the two basic building blocks of workflows that are the job arrays and job dependencies. Job arrays allow creating parametrised jobs that all look identical except for one parameter that varies through the workflow, while job dependencies enable a fixed ordering of jobs and make sure the steps of the workflows are carried on only when their requirements (input data, software, output directory, etc.) are available. It will also discuss the concepts of micro-scheduling (running multiple small jobs steps inside of a single job allocation) and macro-scheduling (submitting multiple jobs at the same time with a single command). The presentation will also introduce the use of basic GNU/Linux commands that make micro- and macro-scheduling easier: xargs, seq, GNU Parallel, GNU Make, envsubst, split, mkfifo. The concepts will be illustrated with Slurm but should apply to any other scheduler. Finally, the session will present Maestro, a little workflow manager developed by the same lab as Slurm originated from, that focuses on documentation and organisation, and that makes it easy to build small workflows without the need to manually submit the jobs and is a nice complement to the Linux tools mentioned earlier.
This session will discuss one specific type of workflows that is checkpoint/restart and how Linux signals can be leveraged to build self-resubmitting jobs that can run longer than the maximum wall time of the cluster.
This presentation will present a collection of tools named atools that help building and managing large job arrays for parametrised studies. Such workflows can be referred to as "wide" workflows: many similar jobs siblings one to another, with no dependency among them.
This session will discuss Makeflow, a tool that can be used to model workflows with many dependencies among jobs. Such workflows can be referred to as "deep" workflows by contrast with the "wide workflows" described earlier.
This presentation will be about Singularity and how to build containers and deploy them on clusters so as to install software in a uniform way, not being stopped by the Linux flavour or available software modules.
This session will be about GitLab and its continuous integration/continuous deployment (CI/CD) features and how it can be used on clusters with a regular user to automatically compile software and even submit benchmark jobs whenever new features or improvements are added to the software you are writing.
We are back to scientific workflows and the seventh presentation will be a tutorial on SnakeMake, a tool that is a bit more complex to use than the other two but that can handle both wide and deep workflows, and can do more things like templating, containers, etc.