start:working_on_mogon:workflow_organization:workflow_management_system

Using a Workflow Management System

Functionality

We provide a modulefile

tools/mflow

which is a thin-wrapper around the snakemake workflow system. While it is possible to use snakemake after loading the module directly, mflow provides a curated cluster configuration for each of its supported workflows.

Using MFLOW

To obtain usage help, you can run

$ mflow -h

then each option supported by mflow is shown.

To see which workflows are available run:

# will list all available workflows - sorted by topic
$ mflow --list-workflows

Each supported workflow will provide an annotation file with supplementary information, this is the name in the second column displayed by '–list-workflows'. You can look it up with

$ mflow --show-annotation <annotation>

Most workflows require you to manually edit and provide a configuration file. This file contains all the information about the input(s) and the used software modules. While snakemake uses Conda (mostly: Bioconda) packages, in an HPC environment it is better to use environment modules due to performance reasons.

We provide sample configurations, which can be obtained with

$ mflow --show-annotation <workflow name>

This can be redirected in a file for you to edit with

$ mflow --show-annotation <workflow name> > my_sample_configuration.cfg

As for every curated workflow a cluster specific configuration is given, most workflows will simply require

  • to provide the SLURM account of a project
  • to edit and select a workflow specific configuration file and , obviously, to
  • select the desired workflow, itself.

A typical call looks like

$ mflow -A <account> -w <workflow> --configfile <workflow specific configuration file>

Some workflows will require to provide a rule specific cluster configuration, too. This can be done using –cluster-config <path to workflow specific configuration file>

Reproducibility

It may see cumbersome at first - manually editing a configuration file! But it not only serves the purpose of providing and selecting all necessary input. The file also provides a document for you: Which software versions have been used? Which was the selected input exactely?

Testing Workflows and Other Snakemake Instructions

To pass parameters to snakemake itself use

$ mflow ... -- <list of snakemake parameters>

A useful application is

$ mflow ... -- --dry-run

to test a given workflow without executing it. Note, that for a dry run some parameters as account, configuration and workflow need be present, too.

Job Output

Each workflow will write its (scientific) output to the locations specified in the configuration file. Curated workflows differentiate between

  • cached output, e.g. read mapping indices, dowloaded reference / input files, etc. This between workflow caching saves time and curated worklows ensure this by their layout
  • temporary output files - these intermediate files are to be deleted once they are not needed anymore as an input. Those files can easily be re-generated and are to be temporarily stored on the scratch file system.
  • final results as specified in the workflow specific configuration.

Call for Collaboration

mflow development takes place at the RLP gitlab server. All contributions are welcome. To contribute you have a number of options:

  • This applies for contributing new workflows, too.
  • Get in touch with us to start a new co-supervised Bachelors- or Masters thesis together.
  • Contribute to documentation - here in the wiki or writing issue reports.

mflow will be under constant maintenance. You can ask about its usage using the usual channels, our mattermost channel or via mail to the HPC group: hpc@uni-mainz.de.

However, issues related to mflow itself should be reported on its project page for better overview and tracking.

  • start/working_on_mogon/workflow_organization/workflow_management_system.txt
  • Last modified: 2021/02/12 11:55
  • by meesters