node_local_scheduling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
node_local_scheduling [2017/11/14 13:30]
meesters [Multithreaded Programs]
node_local_scheduling [2020/10/02 15:11]
jrutte02 removed
Line 1: Line 1:
 ====== Node-local scheduling ====== ====== Node-local scheduling ======
  
-There are some use cases, where you would want to simply request a **full cluster node** from the LSF batch system and then run **many** //(e.g. much more than 64)// **smaller** //(e.g. only a fragment of the total job runtime)// tasks on this full node. Then of course you will need some **local scheduling** on this node to ensure proper utilization of all cores.+There are some use cases, where you would want to simply request a **full cluster node** from slurm and then run **many** //(e.g. much more than 64)// **small** //(e.g. only a fragment of the total job runtime)// tasks on this full node. Then of course you will need some **local scheduling** on this node to ensure proper utilization of all cores.
  
 To accomplish this, we suggest you use the [[http://www.gnu.org/software/parallel/|GNU Parallel]] program. The program is installed to ''/cluster/bin'', but you can also simply load the [[modules|modulefile]] ''software/gnu_parallel'' so that you can also access its man page. To accomplish this, we suggest you use the [[http://www.gnu.org/software/parallel/|GNU Parallel]] program. The program is installed to ''/cluster/bin'', but you can also simply load the [[modules|modulefile]] ''software/gnu_parallel'' so that you can also access its man page.
Line 23: Line 23:
 </file> </file>
  
-Now of course we could submit 150 jobs using LSF or we could use one job which processes the files one after another, but the most elegant way would be to submit one job for 64 cores (e.g. a whole node on Mogon I) and process the files in parallel. This is especially convenient, since we can then use the ''nodeshort'' queue which has better scheduling characteristics than ''short'' (while both show better scheduling compared to there ''long'' counterparts:+Now of course we could submit 150 jobs using slurm or we could use one job which processes the files one after another, but the most elegant way would be to submit one job for 64 cores (e.g. a whole node on Mogon I) and process the files in parallel. This is especially convenient, since we can then use the ''nodeshort'' queue which has better scheduling characteristics than ''short'' (while both show better scheduling compared to their ''long'' counterparts:
  
 <file bash parallel_job> <file bash parallel_job>
Line 150: Line 150:
  
  
-==== An example showing the use of functions, variables and redirection ==== 
  
-This example shows how to use user-defined functions, variables and anonymous pipes in bash. It uses [[http://bio-bwa.sourceforge.net/|bwa]], a bio-informatics tool to map unknown DNA-sequences to a given reference. The reason, why it is chosen, is that it requires different kind of inputs: A reference file or directory and at least two input files. These input files may not be compressed, but, as the script shows, uncompression be means of a gzip / zcat and redirection is a working solution. 
- 
-  * Note, that this example sets ''set -x'' to print command traces. This is intended merely to ease comprehension. For a script in production, this should be out-commented. 
-  * Also note, that information about the function is carried to the sub-shells with the ''export -f'' statement. 
-  * Variables which are not stated upon the call of GNU ''parallel'' can be made available to a function with additional ''export'' statements (here: ''$OUTPUTDIR''). 
-  * Positional variables (here, just ''$1'' in our example function) are given by the call through GNU ''parallel''. This is useful for parameters which should change for every iteration, here: the input file name. 
- 
-//In particular//: bwa can be slow on files bigger than a few GB. Hence, the proposed trade-off between threads and concurrently running processes, the load balancing, might not be optimal. (Many smaller files will probably analyzed faster with only 4 threads and 16 concurrently running processes (the ''-j'' option).) 
- 
-<file bash> 
-#!/bin/bash 
- 
-#SBATCH --job-name=bwa_demo_gnu_parallel 
-#SBATCH --output=res_bwa_gnu_parallel.log 
-#SBATCH -N 1 
-#SBATCH --time=300 
-#SBATCH -p nodeshort 
-#SBATCH -A <your account> 
-#SBATCH --gres=ramdisk:30G 
- 
- 
-# This script is written by Christian Meesters (HPC-team, ZDV, Mainz) 
- 
-# Please note: It is valid for Mogon I. The following restrictions apply: 
-# - if your fastq-files in the defined inputdirectory are big, the given 
-#   memory might not be sufficient. In this case restrict yourself to  
-#   a data subset. 
- 
-# in order to see the output of all commands, we set this: 
-set -x 
- 
-#1 we purge all possible module to avoid a mangled setup 
-module purge 
- 
-#2 we load out GNU parallel module (latest version) 
-module load tools/parallel 
- 
-#3 in order perform our alignment (here: bwa sampe) and subsequent sorting we 
-#  load bwa and samtools 
-module load bio/BWA 
-module load bio/SAMtools 
- 
-#4 make the return value of the last pipe command which fails the return value 
-set -o pipefail 
- 
-#5 set a path the reference genome, extract its directory path 
-REFERENCEGENOME="reference/hg19.fasta" 
-REFERENCEDIR=$(dirname $REFERENCEGENOME) 
- 
-#6 select a base directory for all input and traverse through it: 
-INPUTBASEDIR=./input 
-#6b now we gather all input files: 
-#  - a first fastq file (ending on _1.fastq) 
-#  - its mate (ending on _2.fastq) 
-# If your files name use a different scheme, adjust this script 
-FORWARD_READS=$(find -L $INPUTBASEDIR -type f -name '*_1.fastq*') 
- 
-#7 create an output directory, here: according to bwa and samtools versions 
-BWA_VERSION=$(bwa |& grep Version | cut -d ' ' -f2 | cut -d '-' -f1 ) 
-export OUTPUTDIR="bwa${BWA_VERSION}_samtools_${SLURM_JOB_ID}" 
- 
-if [ ! -d "$OUTPUTDIR" ]; then 
-    mkdir -p "$OUTPUTDIR" 
-fi 
- 
-#8 copy the reference to the ramdisk 
-NEWREFERENCEDIR=/localscratch/${SLURM_JOB_ID}/ramdisk/reference 
-mkdir -p $NEWREFERENCEDIR 
- 
-for FILE in $REFERENCEDIR/*; do 
-  sbcast -f $FILE $NEWREFERENCEDIR/$(basename $FILE) 
-done 
- 
-REFERENCEGENOME=$NEWREFERENCEDIR/$(basename $REFERENCEGENOME) 
-REFERENCEDIR=$NEWREFERENCEDIR 
- 
-#9 create an alignment function with the appropriate calls for bwa and samtools 
-function bwa_aln { 
-  TEMPOUT=$(basename $1) 
-  # check file ending: is the file ending on gz? 
-  if [ "$1" == "*.gz" ]; then 
-        #bwa sampe $REFERENCEGENOME <( bwa aln -t 4 $REFERENCEGENOME <(zcat $1) ) \ 
-        #                           <( bwa aln -t 4 $REFERENCEGENOME <(zcat ${1/_1/_2} ) ) \ 
-        #                           <(zcat $1) <(zcat ${1/_1/_2}) | \ 
-        #samtools view -Shb /dev/stdin > "$OUTPUTDIR/${TEMPOUT%_1.fastq.gz}_aligned.bam" 
-        bwa mem -M -t 8 $REFERENCEGENOME <(zcat $1) <(zcat ${1/_1/_2})  | 
-        samtools view -Shb /dev/stdin > "$OUTPUTDIR/${TEMPOUT%_1.fastq}_aligned.bam" 
-  else 
-        #bwa sampe $REFERENCEGENOME <( bwa aln -t 4 $REFERENCEGENOME $1 ) \ 
-        #                           <( bwa aln -t 4 $REFERENCEGENOME ${1/_1/_2} ) \ 
-        #                           $1 ${1/_1/_2} | \ 
- 
-        bwa mem -M -t 8 $REFERENCEGENOME $1 ${1/_1/_2} | 
-        samtools view -Shb /dev/stdin > "$OUTPUTDIR/${TEMPOUT%_1.fastq}_aligned.bam" 
-  fi 
-} 
- 
-#9b we need to export this function, such that all subprocesses will see it (only works in bash) 
-export -f bwa_aln 
- 
-# finally we start processing 
-#  we consider taking 4 thread for each call of bwa aln, hence 8 threads 
-#  and 64 / 8 is 8. This results in a little over subscription, as bwa mem 
-#  runs with 8 threads and samtools is run, too. 
-#  Note the ungrouping of output with the -u option. 
-parallel -v -u --env bwa_aln --no-notice -j 8 bwa_aln ::: $FORWARD_READS 
-</file> 
 ===== Running on several hosts ===== ===== Running on several hosts =====
  
 We do not recommend supplying a hostlist to GNU parallel with the ''-S'' option, as GNU parallel attempts to ssh on the respective nodes (inluding the master host) and therefore looses the environment. You can script around this, but you will run into a quotation hell. We do not recommend supplying a hostlist to GNU parallel with the ''-S'' option, as GNU parallel attempts to ssh on the respective nodes (inluding the master host) and therefore looses the environment. You can script around this, but you will run into a quotation hell.
- 
-<WRAP center round todo 90%> 
-Alas, the following example does not work due to the current settings within SLURM. We strive to solve this issue. Until then, please use the example below.   
-</WRAP> 
- 
-Instead, we recommend a setup similar to 
  
 <file bash multi_host> <file bash multi_host>
Line 276: Line 162:
 #SBATCH -p parallel  # for Mogon II #SBATCH -p parallel  # for Mogon II
 #SBATCH --nodes=3 # appropriate number of Nodes #SBATCH --nodes=3 # appropriate number of Nodes
-#SBATCH -n 192    # example value for Mogon I, see below+#SBATCH -n 24    # example value for Mogon I, see below
 #SBATCH -t 300 #SBATCH -t 300
-#SBATCH --cpus-per-task=8 # we assume an application which scales to 8 threads, but +#SBATCH -c=8 # we assume an application which scales to 8 threads, but 
-                          # -c / --cpus-per-task could also be ommited (default is =1) +             # -c / --cpus-per-task cat be ommited (default is =1) 
-                          # or set to a different value.+             # or set to a different value.
 #SBATCH -o <your logfile prefix>_%j.log #SBATCH -o <your logfile prefix>_%j.log
  
 #adjust / overwrite those two commands to enhance readability & overview #adjust / overwrite those two commands to enhance readability & overview
 # parameterize srun # parameterize srun
-srun="srun -N1 -n 1 -c $SLURM_CPUS_PER_TASK  --jobid $SLURM_JOBID --cpu_bind=q"+srun="srun -N1 -n 1 -c $SLURM_CPUS_PER_TASK  --jobid $SLURM_JOBID --cpu_bind=q --mem-per-cpu=$((SLURM_MEM_PER_NODE / SLURM_NTASKS))"
 # parameterize parallel # parameterize parallel
 parallel="parallel -j $SLURM_NTASKS --no-notice " parallel="parallel -j $SLURM_NTASKS --no-notice "
Line 295: Line 181:
 </file> </file>
  
-The number of tasks given by ''-n'' should be the number of CPUs * the number of nodes. However, bear in mind that the a-nodes of Mogon I have 1 FPU per 2 CPU Module and the z-nodes of Mogon II have 20 CPUs, each with hyptherthreading enables. Wich number you best assume to be the number of cores is application depended and should best be determinded experimentally.+<WRAP center round info 95%> 
 +The number of tasks (given by ''-n'') times the number of cpus per task (given by ''-c'') needs to be equal the number of nodes (given by ''-N'') times number of CPUs per nodes (to be inferred from ''scontrol show node <nodename>'' or in the [[nodes|wiki]].) Or (in pseudo bash)):
  
-<WRAP center round todo 90%+<code bash
-This solution is an alternate generic solution for the challenge to run a GNU-parallel job using several hosts on mogon.  +# ensure 
 +((SLURM_CPUS_PER_TASK * SLURM_NTASKS)) -eq $((SLURM_CPUS_ON_NODE * SLURM_CPUS_ON_NODE)) 
 +</code>
 </WRAP> </WRAP>
- 
-<file bash multi_host2> 
-#!/bin/bash 
-#SBATCH -J <your meaningful job name> 
-#SBATCH -A <your account> 
-#SBATCH -p nodeshort # for Mogon I 
-#SBATCH -p parallel  # for Mogon II 
-#SBATCH --nodes=3 # appropriate number of Nodes 
-#SBATCH -n 192    # example value for Mogon I, see below 
-#SBATCH -t 300 
-                           
-cmd=$(pwd)/cmd_file.sh 
- 
-# write a (temporary command file) 
-echo '#!/bin/bash' > $cmd 
- 
-# set up the environment within that file 
-echo 'module load <your desired modules' >> $cmd 
-printf "$your_defined_command <some args>" >> $cmd 
-# arguments which require string manipulation need different quoting, e.g.: 
-echo '--input ./$1 --output ./output/$(basename ${1%input_suffix}out)' >> $cmd 
-chmod +x $cmd 
- 
-HOSTLIST=$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s ) 
- 
-# start the parallel work on the defined hosts 
-parallel  --workdir $PWD --sshdelay 0.2 -S $HOSTLIST -j <number of parallel tasks per host>  $cmd :::<parameter list>        
-</file> 
- 
  
 ====== SLURM multiprog for uneven arrays ====== ====== SLURM multiprog for uneven arrays ======
  
-The [[https://slurm.schedmd.com/srun.html|SLURM multiprog]] option in ''srun'' essentially displays a master-slave setup. You need it to run within a SLRUM job allocation and trigger ''srun'' with the ''%%--multi-prog%%'' option and appropriate multiprog file:+The [[https://slurm.schedmd.com/srun.html|SLURM multiprog]] option in ''srun'' essentially displays a master-slave setup. You need it to run within a SLURM job allocation and trigger ''srun'' with the ''%%--multi-prog%%'' option and appropriate multiprog file:
  
 <file bash master_slave_simple.sh> <file bash master_slave_simple.sh>