start:working_on_mogon:io_odds_and_ends:slurm_localscratch

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
start:working_on_mogon:io_odds_and_ends:slurm_localscratch [2020/10/19 13:51] – [Local Scratch Space] meestersstart:working_on_mogon:io_odds_and_ends:slurm_localscratch [2020/10/19 13:53] – [Signalling in SLURM -- difference between signalling submission scripts and applications] meesters
Line 12: Line 12:
 </callout> </callout>
  
-**Attention:** This is //not// a shared filesystem!\\ 
 If your job runs on multiple nodes, you cannot use the local scratch space on one node from the other nodes.\\ If your job runs on multiple nodes, you cannot use the local scratch space on one node from the other nodes.\\
 If you need your input data on every node, please refer to the section [[slurm_localscratch#Copy_files_via_job_script|"Copy files to multiple nodes via job script"]]. If you need your input data on every node, please refer to the section [[slurm_localscratch#Copy_files_via_job_script|"Copy files to multiple nodes via job script"]].
Line 100: Line 99:
 </code> </code>
  
-withing a submission script to signal the batch-job (instead of all the children of but not the batch job itselft). The reason is: If using a submission script like the one above, you trap the signal within the script, not the application. In case an application accepts a specific signal and you want to use this functionality, you can send the signal from within the script: +withing a submission script to signal the batch-job (instead of all the children of but not the batch job itselft). The reason is: If using a submission script like the one above, you trap the signal within the script, not the application. 
- +
-<code bash> +
-# list of process IDs (PIDs) to signal +
-QUEUE="" +
- +
-function queue { +
-  QUEUE="$QUEUE $1" +
-+
- +
-function forward_signal() { +
-  # this function might fulfill additional purposes, like +
-  # forwarding the signal, waiting a checkpoint to be written +
-  # and then copying the last checkpoint back to the parallel file system +
-   +
-  # just send the desired signal, e.g. SIGUSR2 +
-  kill -s SIGUSR2 $1 +
-+
- +
-# trap the signal within the bash script +
-# it is possible to connect several functions with a signal +
-trap 'forward_signal' SIGUSR2 +
- +
-# start the desired application(s) - note the & +
-eval "my command and its parameters &" +
-# store the PID of the desired application(s) +
-queue $!  +
-# The sequence above needs to be carried out for every application instance +
-# you want to be signalled. +
-</code>+
  
 </callout> </callout>
- 
- 
 ===== Copy files to multiple nodes via job script ===== ===== Copy files to multiple nodes via job script =====
  
  • start/working_on_mogon/io_odds_and_ends/slurm_localscratch.txt
  • Last modified: 2022/06/20 18:05
  • by meesters