start:working_on_mogon:io_odds_and_ends:slurm_localscratch

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
start:working_on_mogon:io_odds_and_ends:slurm_localscratch [2020/10/19 13:45]
meesters [Copy files via job script] erased non-needed comment
start:working_on_mogon:io_odds_and_ends:slurm_localscratch [2022/06/20 18:05] (current)
meesters [Signalling in SLURM -- difference between signalling submission scripts and applications] - minor grammar fixes and removed doubled lines
Line 1: Line 1:
 ====== Local Scratch Space ====== ====== Local Scratch Space ======
  
-On every node, there is local scratch space available to your running jobs that you should use if possible.+On every node, there is local scratch space available to your running jobs.
 Every job can therefore use a directory called ''/localscratch/${SLURM_JOB_ID}/'' on the local disk. If a job array starts then this directory also called ''/localscratch/${SLURM_JOB_ID}/'', where the variable ''SLURM_ARRAY_TASK_ID'' is an index of a subjob in the job array and unrelated to ''$SLURM_JOB_ID'' Every job can therefore use a directory called ''/localscratch/${SLURM_JOB_ID}/'' on the local disk. If a job array starts then this directory also called ''/localscratch/${SLURM_JOB_ID}/'', where the variable ''SLURM_ARRAY_TASK_ID'' is an index of a subjob in the job array and unrelated to ''$SLURM_JOB_ID''
 +
 +<callout type="info" icon="true" title="When to use Local Scratch">
 +If your job(s) in question are merely reading and writing big files in a linear mode, there is no requirement to use a local scratch or a ramdisk. However, these are scenarios, where using the local scratch might be beneficial:
 +  * if your job produces many temporary files
 +  * if your job reads a file or set of files in a directory repeatedly during run time (for multiple threads or concurrent jobs mean a random access pattern to the global file system, which is a true performance killer)
 +</callout>
  
 <callout type="info" icon="true"> <callout type="info" icon="true">
Line 12: Line 18:
 </callout> </callout>
  
-**Attention:** This is //not// a shared filesystem!\\ 
 If your job runs on multiple nodes, you cannot use the local scratch space on one node from the other nodes.\\ If your job runs on multiple nodes, you cannot use the local scratch space on one node from the other nodes.\\
 If you need your input data on every node, please refer to the section [[slurm_localscratch#Copy_files_via_job_script|"Copy files to multiple nodes via job script"]]. If you need your input data on every node, please refer to the section [[slurm_localscratch#Copy_files_via_job_script|"Copy files to multiple nodes via job script"]].
Line 21: Line 26:
 Assume you would normally start the program in the current working directory where it will read and write its data like this: Assume you would normally start the program in the current working directory where it will read and write its data like this:
 <code bash> <code bash>
-$ sbatch -N1 -p nodeshort ./my_program # mogonI +$ sbatch -N1 -p parallel ./my_program
-+
-$ sbatch -N1 -p parallel ./my_program # mogonII +
 </code> </code>
 Now to get the performance of local disk access, you want to use the aforementioned local scratch space on the compute node. Now to get the performance of local disk access, you want to use the aforementioned local scratch space on the compute node.
Line 33: Line 35:
  
  
-===== Copy files via job script =====+===== Copy files via job script and signalling batch scripts with SLURM =====
  
-This methods requires you to wrap your program in small shell script like this:+The following example will submit jobscript, where SLURM will send a signal to the job script prior to ending. This will enable the jobscript to collect data written to the local scratch director(ies). 
  
 <file bash job.sh> <file bash job.sh>
 #!/bin/bash #!/bin/bash
 +
 +#SBATCH -A <your slurm account>
 +#SBATCH -p parallel 
 +#SBATCH -t <appropriate time>
 +#SBATCH --signal=B:SIGUSR2@600 # e.g. signal 10 minutes before the job will end
 +                               # time, here, is defined in seconds.
  
 # Store working directory to be safe # Store working directory to be safe
Line 45: Line 53:
 # We define a bash function to do the cleaning when the signal is caught # We define a bash function to do the cleaning when the signal is caught
 cleanup(){ cleanup(){
 +    # Note: The following only works on single with output on the node,
 +    #       where the jobscript is running.
 +    #       For multinode output, you can use the 'sgather' command or
 +    #       get in touch with us, if the case is more complex. 
     cp /localscratch/${SLURM_JOB_ID}/output_file ${SAVEDPWD}/ &     cp /localscratch/${SLURM_JOB_ID}/output_file ${SAVEDPWD}/ &
     cp /localscratch/${SLURM_JOB_ID}/restart_file ${SAVEDPWD}/ &     cp /localscratch/${SLURM_JOB_ID}/restart_file ${SAVEDPWD}/ &
Line 67: Line 79:
 </file> </file>
  
-===== Sending signals to jobs withing SLURM ===== 
  
-In the case where your application does not write to the job directory during the run, but upon ending, you need to adapt the snippet above to accommodate for this: The application needs to be signaled and needs time to write its output before copying the files back onto the parallel file system: 
  
-<file bash job_syncronous.sh> 
-#!/bin/bash 
  
- 
-#SBATCH -A <your slurm account> 
-#SBATCH -p parallel  
-#SBATCH -t <appropriate time> 
-#SBATCH --signal=B:SIGUSR2@600 # e.g. signal 10 minutes before the job will end 
- 
-# Store working directory to be safe 
-SAVEDPWD=$(pwd) 
- 
-# we need to save the PIDs generated by this script, whether it is starting several 
-# applications or only one: 
-QUEUE="" 
- 
-# We define a bash function to do the cleaning when the signal is caught 
-cleanup(){ 
-    # ATTENTION: If your application will write its output upon receiving a specific signal 
-    # this very signal need to be send to the application with 'kill -s <signal>'. The -0 flag 
-    # ensures that there is no error message, if the PID in question cannot be found. 
-     
-    # As several commands can be started within this script, we need to cycle over all PIDs. 
-    for PID in $QUEUE; do 
-        kill -0 $PID 
-    done 
-     
-    # Subsequently, you give the program some time to exit gracefully and write the output. 
-    # Adjust or skip this delay, if necessary or appropriate. 
-    sleep 30 
-     
-    cp /localscratch/${SLURM_JOB_ID}/output_file ${SAVEDPWD}/ & 
-    cp /localscratch/${SLURM_JOB_ID}/restart_file ${SAVEDPWD}/ & 
-    wait # wait to ensure both concurrent copy processes are done 
-    exit 0 
-} 
-# Register the cleanup function when SIGUSR2 is sent, 
-# ten minutes before the job gets killed 
-trap 'cleanup' SIGUSR2 
- 
-# Copy input file 
-cp ${SAVEDPWD}/input_file /localscratch/${SLURM_JOB_ID}/ 
-cp ${SAVEDPWD}/restart_file /localscratch/${SLURM_JOB_ID}/ 
- 
-# Go to jobdir and start the program 
-cd /localscratch/${SLURM_JOB_ID} 
-${SAVEDPWD}/my_program & 
- 
-# We save our programs PID for later use (for the use case, see the cleanup()-function). 
-# Storing the PID from the last command (started with '&' and known in bash as '$!') needs to be 
-# done for every application start, e.g. in a loop. 
-QUEUE="QUEUE $!" 
- 
-# Call the cleanup function when everything went fine 
-cleanup  
-</file> 
- 
-<code bash> 
-$ chmod +x ./job.sh 
-$ sbatch  ./job.sh 
-</code> 
  
 ===== Signalling in SLURM -- difference between signalling submission scripts and applications ===== ===== Signalling in SLURM -- difference between signalling submission scripts and applications =====
Line 141: Line 91:
 $ sbatch --signal=SIGUSR2@600 ... $ sbatch --signal=SIGUSR2@600 ...
 </code> </code>
-This would send the signal ''SIGUSR2'' to the application ten minutes before hitting the walltime of the job. Note that the slurm documentation states that there is uncertainty of up to 1 minute.+This would send the signal ''SIGUSR2'' to the application ten minutes before hitting the walltime of the job. Note that the slurm documentation states that there is an uncertainty of up to 1 minute.
  
 **Usually** this requires you to use  **Usually** this requires you to use 
Line 149: Line 99:
 </code> </code>
  
-or rather +within a submission script to signal the batch-job (instead of all the children of but not the batch job itselft). The reason is: If using a submission script like the one above, you trap the signal within the script, not the application. 
- +
-<code bash> +
-#SBATCH --signal=B:SIGUSR2@600 ... +
-</code> +
- +
-withing a submission script to signal the batch-job (instead of all the children of but not the batch job itselft). The reason is: If using a submission script like the one above, you trap the signal within the script, not the application. In case an application accepts a specific signal and you want to use this functionality, you can send the signal from within the script: +
- +
-<code bash> +
-# list of process IDs (PIDs) to signal +
-QUEUE="" +
- +
-function queue { +
-  QUEUE="$QUEUE $1" +
-+
- +
-function forward_signal() { +
-  # this function might fulfill additional purposes, like +
-  # forwarding the signal, waiting a checkpoint to be written +
-  # and then copying the last checkpoint back to the parallel file system +
-   +
-  # just send the desired signal, e.g. SIGUSR2 +
-  kill -s SIGUSR2 $1 +
-+
- +
-# trap the signal within the bash script +
-# it is possible to connect several functions with a signal +
-trap 'forward_signal' SIGUSR2 +
- +
-# start the desired application(s) - note the & +
-eval "my command and its parameters &" +
-# store the PID of the desired application(s) +
-queue $!  +
-# The sequence above needs to be carried out for every application instance +
-# you want to be signalled. +
-</code>+
  
 </callout> </callout>
- 
- 
 ===== Copy files to multiple nodes via job script ===== ===== Copy files to multiple nodes via job script =====
  
 The following script can be used to ensure that input files are present in the job directory on **all** nodes.\\ The following script can be used to ensure that input files are present in the job directory on **all** nodes.\\
-This is required for e.g. [[software:namd2|NAMD2]], which in some cases reads input data on other nodes than the starting node. 
- 
-This script is very verbose, you might want to delete or comment out the ''echo'' lines. 
  
-Also, this script copies data from **all** nodes back into separate directories named ''${SAVEDPWD}/${SLURM_JOB_NODELIST}/${host}''+The demonstrated ''sbcast'' command can also be used for the one-node example above.
-\\If your application //only// needs to read on every node but does not write on every node, you want to use the cleanup function from the script posted above. The demonstrated ''sbcast'' command can also be used for the one-node example above.+
  
 <file bash job_multinode.sh> <file bash job_multinode.sh>
 #!/bin/bash #!/bin/bash
  
-#SBATCH -N 2 # assuming mogon I 'bulldozer' := 64 cores +#SBATCH -N 2  
-#SBATCH -J 'namd2_128' +use other parameterization as appropriate
-#SBATCH -p nodeshort +
-#SBATCH -mem 1800M+
  
 JOBDIR="/localscratch/${SLURM_JOB_ID}" JOBDIR="/localscratch/${SLURM_JOB_ID}"
-HOSTLIST=$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s | tr ',' ' ') 
-echo $HOSTLIST 
  
-# Store working directory to be safe 
-SAVEDPWD=$(pwd) 
-  
-# We define a bash function to do the cleaning when the signal is caught 
-cleanup() { 
-     sbcast ${JOBDIR}/resultfile ${SAVEDPWD}/${SLURM_JOB_ID}/resultfile 
-     exit 0 
-} 
-  
-# Register the cleanup function when SIGUSR2 is sent, 
-# ten minutes before the job gets killed 
-trap 'cleanup' SIGUSR2 
  
 # copy the input file on all nodes # copy the input file on all nodes
-sbcast ${HOME}/NAMD_2.9/apoa1.namd $JOBDIR/apoa1.namd +sbcast <somefile> $JOBDIR/<somefile>
-# some applications only need the file on the 'master' node +
-# in this case you can restrict yourself to: +
-cp ${HOME}/NAMD_2.9/apoa1.namd $JOBDIR/apoa1.namd+
  
-Go to jobdir and start the program +NOTE: Unlike 'cp' which accepts a directory and would assume that  
-cd "${JOBDIR}" +#       the destination file carries the same name, 'sbcast' 
-  +#       requires that a filename is given for the destination.
-$@ "${JOBDIR}/apoa1.namd"+
  
-# Call the cleanup function when everything went fine 
-cleanup 
 </file> </file>
  
-This script is used as follows: 
  
-<code bash> 
-$ chmod +x ./job_multinode.sh 
-$ namd2 # after loading the appropriate module 
-</code> 
  
  • start/working_on_mogon/io_odds_and_ends/slurm_localscratch.1603107956.txt.gz
  • Last modified: 2020/10/19 13:45
  • by meesters