Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
node_local_scheduling [2017/11/22 11:26] – [Node-local scheduling] nietocp1 | node_local_scheduling [2019/06/13 14:46] – [Running on several hosts] meesters | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Node-local scheduling ====== | ====== Node-local scheduling ====== | ||
- | There are some use cases, where you would want to simply request a **full cluster node** from the LSF batch system | + | There are some use cases, where you would want to simply request a **full cluster node** from slurm and then run **many** //(e.g. much more than 64)// **small** //(e.g. only a fragment of the total job runtime)// tasks on this full node. Then of course you will need some **local scheduling** on this node to ensure proper utilization of all cores. |
To accomplish this, we suggest you use the [[http:// | To accomplish this, we suggest you use the [[http:// | ||
Line 23: | Line 23: | ||
</ | </ | ||
- | Now of course we could submit 150 jobs using LSF or we could use one job which processes the files one after another, but the most elegant way would be to submit one job for 64 cores (e.g. a whole node on Mogon I) and process the files in parallel. This is especially convenient, since we can then use the '' | + | Now of course we could submit 150 jobs using slurm or we could use one job which processes the files one after another, but the most elegant way would be to submit one job for 64 cores (e.g. a whole node on Mogon I) and process the files in parallel. This is especially convenient, since we can then use the '' |
<file bash parallel_job> | <file bash parallel_job> | ||
Line 150: | Line 150: | ||
- | ==== An example showing the use of functions, variables and redirection ==== | ||
- | This example shows how to use user-defined functions, variables and anonymous pipes in bash. It uses [[http:// | ||
- | |||
- | * Note, that this example sets '' | ||
- | * Also note, that information about the function is carried to the sub-shells with the '' | ||
- | * Variables which are not stated upon the call of GNU '' | ||
- | * Positional variables (here, just '' | ||
- | |||
- | //In particular//: | ||
- | |||
- | <file bash> | ||
- | #!/bin/bash | ||
- | |||
- | #SBATCH --job-name=bwa_demo_gnu_parallel | ||
- | #SBATCH --output=res_bwa_gnu_parallel.log | ||
- | #SBATCH -N 1 | ||
- | #SBATCH --time=300 | ||
- | #SBATCH -p nodeshort | ||
- | #SBATCH -A <your account> | ||
- | #SBATCH --gres=ramdisk: | ||
- | |||
- | |||
- | # This script is written by Christian Meesters (HPC-team, ZDV, Mainz) | ||
- | # | ||
- | # Please note: It is valid for Mogon I. The following restrictions apply: | ||
- | # - if your fastq-files in the defined inputdirectory are big, the given | ||
- | # | ||
- | # a data subset. | ||
- | |||
- | # in order to see the output of all commands, we set this: | ||
- | set -x | ||
- | |||
- | #1 we purge all possible module to avoid a mangled setup | ||
- | module purge | ||
- | |||
- | #2 we load out GNU parallel module (latest version) | ||
- | module load tools/ | ||
- | |||
- | #3 in order perform our alignment (here: bwa sampe) and subsequent sorting we | ||
- | # load bwa and samtools | ||
- | module load bio/BWA | ||
- | module load bio/ | ||
- | |||
- | #4 make the return value of the last pipe command which fails the return value | ||
- | set -o pipefail | ||
- | |||
- | #5 set a path the reference genome, extract its directory path | ||
- | REFERENCEGENOME=" | ||
- | REFERENCEDIR=$(dirname $REFERENCEGENOME) | ||
- | |||
- | #6 select a base directory for all input and traverse through it: | ||
- | INPUTBASEDIR=./ | ||
- | #6b now we gather all input files: | ||
- | # - a first fastq file (ending on _1.fastq) | ||
- | # - its mate (ending on _2.fastq) | ||
- | # If your files name use a different scheme, adjust this script | ||
- | FORWARD_READS=$(find -L $INPUTBASEDIR -type f -name ' | ||
- | |||
- | #7 create an output directory, here: according to bwa and samtools versions | ||
- | BWA_VERSION=$(bwa |& grep Version | cut -d ' ' -f2 | cut -d ' | ||
- | export OUTPUTDIR=" | ||
- | |||
- | if [ ! -d " | ||
- | mkdir -p " | ||
- | fi | ||
- | |||
- | #8 copy the reference to the ramdisk | ||
- | NEWREFERENCEDIR=/ | ||
- | mkdir -p $NEWREFERENCEDIR | ||
- | |||
- | for FILE in $REFERENCEDIR/ | ||
- | sbcast -f $FILE $NEWREFERENCEDIR/ | ||
- | done | ||
- | |||
- | REFERENCEGENOME=$NEWREFERENCEDIR/ | ||
- | REFERENCEDIR=$NEWREFERENCEDIR | ||
- | |||
- | #9 create an alignment function with the appropriate calls for bwa and samtools | ||
- | function bwa_aln { | ||
- | TEMPOUT=$(basename $1) | ||
- | # check file ending: is the file ending on gz? | ||
- | if [ " | ||
- | #bwa sampe $REFERENCEGENOME <( bwa aln -t 4 $REFERENCEGENOME <(zcat $1) ) \ | ||
- | # <( bwa aln -t 4 $REFERENCEGENOME <(zcat ${1/_1/_2} ) ) \ | ||
- | # < | ||
- | #samtools view -Shb /dev/stdin > " | ||
- | bwa mem -M -t 8 $REFERENCEGENOME <(zcat $1) <(zcat ${1/ | ||
- | samtools view -Shb /dev/stdin > " | ||
- | else | ||
- | #bwa sampe $REFERENCEGENOME <( bwa aln -t 4 $REFERENCEGENOME $1 ) \ | ||
- | # <( bwa aln -t 4 $REFERENCEGENOME ${1/_1/_2} ) \ | ||
- | # $1 ${1/_1/_2} | \ | ||
- | |||
- | bwa mem -M -t 8 $REFERENCEGENOME $1 ${1/_1/_2} | | ||
- | samtools view -Shb /dev/stdin > " | ||
- | fi | ||
- | } | ||
- | |||
- | #9b we need to export this function, such that all subprocesses will see it (only works in bash) | ||
- | export -f bwa_aln | ||
- | |||
- | # finally we start processing | ||
- | # we consider taking 4 thread for each call of bwa aln, hence 8 threads | ||
- | # and 64 / 8 is 8. This results in a little over subscription, | ||
- | # runs with 8 threads and samtools is run, too. | ||
- | # Note the ungrouping of output with the -u option. | ||
- | parallel -v -u --env bwa_aln --no-notice -j 8 bwa_aln ::: $FORWARD_READS | ||
- | </ | ||
===== Running on several hosts ===== | ===== Running on several hosts ===== | ||
We do not recommend supplying a hostlist to GNU parallel with the '' | We do not recommend supplying a hostlist to GNU parallel with the '' | ||
- | |||
- | <WRAP center round todo 90%> | ||
- | Alas, the following example does not work due to the current settings within SLURM. We strive to solve this issue. Until then, please use the example below. | ||
- | </ | ||
- | |||
- | Instead, we recommend a setup similar to | ||
<file bash multi_host> | <file bash multi_host> | ||
Line 276: | Line 162: | ||
#SBATCH -p parallel | #SBATCH -p parallel | ||
#SBATCH --nodes=3 # appropriate number of Nodes | #SBATCH --nodes=3 # appropriate number of Nodes | ||
- | #SBATCH -n 192 # example value for Mogon I, see below | + | #SBATCH -n 24 # example value for Mogon I, see below |
#SBATCH -t 300 | #SBATCH -t 300 | ||
- | #SBATCH --cpus-per-task=8 # we assume an application which scales to 8 threads, but | + | #SBATCH -c=8 # we assume an application which scales to 8 threads, but |
- | # -c / --cpus-per-task | + | |
- | # or set to a different value. | + | |
#SBATCH -o <your logfile prefix> | #SBATCH -o <your logfile prefix> | ||
#adjust / overwrite those two commands to enhance readability & overview | #adjust / overwrite those two commands to enhance readability & overview | ||
# parameterize srun | # parameterize srun | ||
- | srun=" | + | srun=" |
# parameterize parallel | # parameterize parallel | ||
parallel=" | parallel=" | ||
Line 295: | Line 181: | ||
</ | </ | ||
- | The number of tasks given by '' | + | <WRAP center round info 95%> |
+ | The number of tasks (given by '' | ||
- | <WRAP center round todo 90%> | + | <code bash> |
- | This solution is an alternate generic solution for the challenge to run a GNU-parallel job using several hosts on mogon. | + | # ensure |
+ | ((SLURM_CPUS_PER_TASK * SLURM_NTASKS)) | ||
+ | </ | ||
</ | </ | ||
- | |||
- | <file bash multi_host2> | ||
- | #!/bin/bash | ||
- | #SBATCH -J <your meaningful job name> | ||
- | #SBATCH -A <your account> | ||
- | #SBATCH -p nodeshort # for Mogon I | ||
- | #SBATCH -p parallel | ||
- | #SBATCH --nodes=3 # appropriate number of Nodes | ||
- | #SBATCH -n 192 # example value for Mogon I, see below | ||
- | #SBATCH -t 300 | ||
- | | ||
- | cmd=$(pwd)/ | ||
- | |||
- | # write a (temporary command file) | ||
- | echo '# | ||
- | |||
- | # set up the environment within that file | ||
- | echo ' | ||
- | printf " | ||
- | # arguments which require string manipulation need different quoting, e.g.: | ||
- | echo ' | ||
- | chmod +x $cmd | ||
- | |||
- | HOSTLIST=$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s ) | ||
- | |||
- | # start the parallel work on the defined hosts | ||
- | parallel | ||
- | </ | ||
- | |||
====== SLURM multiprog for uneven arrays ====== | ====== SLURM multiprog for uneven arrays ====== | ||
- | The [[https:// | + | The [[https:// |
<file bash master_slave_simple.sh> | <file bash master_slave_simple.sh> |