start:working_on_mogon:workflow_organization:workflow_management_system

# Functionality

We provide a modulefile

tools/mflow

which is a thin-wrapper around the snakemake workflow system. While it is possible to use snakemake after loading the module directly, mflow provides a curated cluster configuration for each of its supported workflows.

# Using MFLOW

To obtain usage help, you can run

$mflow -h then each option supported by mflow is shown. To see which workflows are available run: # will list all available workflows - sorted by topic$ mflow --list-workflows

Each supported workflow will provide an annotation file with supplementary information, this is the name in the second column displayed by '–list-workflows'. You can look it up with

$mflow --show-annotation <annotation> Most workflows require you to manually edit and provide a configuration file. This file contains all the information about the input(s) and the used software modules. While snakemake uses Conda (mostly: Bioconda) packages, in an HPC environment it is better to use environment modules due to performance reasons. We provide sample configurations, which can be obtained with $ mflow --show-configuration <workflow name>

This will print the configuration onto the terminal and write a file <workflow name>.yaml for you to be edited according to your input.

As for every curated workflow a cluster specific configuration is given, most workflows will simply require

• to provide the SLURM account of a project
• to edit and select a workflow specific configuration
• select the desired workflow, itself.

A typical call looks like

$mflow -A <account> -w <workflow> --configfile <workflow specific configuration file> Some workflows will require to provide a rule specific cluster configuration, too. This can be done using –cluster-config <path to workflow specific configuration file> #### Reproducibility It may see cumbersome at first - manually editing a configuration file! But it not only serves the purpose of providing and selecting all necessary input. The file also provides a document for you: Which software versions have been used? Which was the selected input exactly? Etc. As soon as you log out or the terminal running the workflow loses its connection, the workflow is aborted. Already submitted jobs are not impacted, they keep running. However, the workflow will not be aware of this and might re-submit, when triggered again. It is also possible to start mflow in nohup mode: $ mflow --nohup ...

If you invoke mflow in this mode, there will be no further output. snakemake will run in the background. Use this only for well established workflows, not whilst designing a workflow.

# Where to work & Reporting Errors

Consider working in your home directory: All temporary files are deleted automatically after 10 days. Working in your home directory will therefore prevent cluttering your groups project directory.

Please consider the difference between cluster related issues / errors and workflow related errors. In order to sort out the issues and to come to a quick solution:

• Mail all cluster related issues to our HPC ticket system or approach us on our mattermost channel.
• If there is an error in a workflow, please try to comprehensively summarize it and open an issue on the project page. To do this, click 'New Issue'.
• Indicate the error message, provide context and show the input configuration and the error log file (attach the respective files).
• Please always use the current mflow version - otherwise support might not be possible to grant.

To pass parameters to snakemake itself use

$mflow ... -- <list of snakemake parameters> A useful application is running $ mflow ... -- --dry-run

to test a given workflow without executing it. Note, that for a dry run some parameters as account, configuration and workflow need be present, too.

If a workflow or workflow job is aborted, an incomplete file can result. A rerun can be triggered with

\$ mflow ... -- --rerun-incomplete

Each workflow will write its (scientific) output to the locations specified in the configuration file. Curated workflows differentiate between

• cached output, e.g. read mapping indices, dowloaded reference / input files, etc. This between workflow caching saves time and curated worklows ensure this by their layout
• temporary output files - these intermediate files are to be deleted once they are not needed anymore as an input. Those files can easily be re-generated and are to be temporarily stored on the scratch file system.
• final results as specified in the workflow specific configuration.

# Provided Workflows

Topic Workflow Name Core Applications
Structure Based Ligand Screening StructureBasedScreening OpenBabel, Modeller, VinaLC
ProteoTranscriptomics ProteoTrans Blast, MaxQuant, Trinity

# Call for Collaboration

mflow development takes place at the RLP gitlab server. All contributions are welcome. To contribute you have a number of options:

• This applies for contributing new workflows, too.
• Get in touch with us to start a new co-supervised Bachelors- or Masters thesis together.
• Contribute to documentation - here in the wiki or writing issue reports.

Any workflow-related issues or issues of mflow itself should be reported on its project page for better overview and tracking.

HPC-related issues can be reported using the usual channels, our mattermost channel or via mail to the HPC group: hpc@uni-mainz.de.

• start/working_on_mogon/workflow_organization/workflow_management_system.txt