User Tools

Site Tools


slurm_localscratch

Local scratch space

On every node, there is local scratch space available to your running jobs that you should use if possible. Every job can therefore use a directory called /localscratch/${SLURM_JOB_ID}/ on the local disk. If a job array starts then this directory also called /localscratch/${SLURM_JOB_ID}/, where the variable SLURM_ARRAY_TASK_ID is an index of a subjob in the job array and unrelated to $SLURM_JOB_ID.

This is not a shared shared file system - this means data copied to one node will not be available on another during the job runtime.

This local scratch is only accessible during the run time on a given job. It will be cleaned up upon the ending of each job.

Attention: This is not a shared filesystem!
If your job runs on multiple nodes, you cannot use the local scratch space on one node from the other nodes.
If you need your input data on every node, please refer to the section "Copy files to multiple nodes via job script".

For the further explanation on this page, we assume you have a program called my_program, which reads input data from ./input_file, writes output data to ./output_file and periodically writes a checkpoint file called ./restart_file. The program shall be executed on a whole node with 64 processors. It probably uses OpenMP.

Assume you would normally start the program in the current working directory where it will read and write its data like this:

$ sbatch -N1 -p nodeshort ./my_program # mogonI
#
$ sbatch -N1 -p parallel ./my_program # mogonII

Now to get the performance of local disk access, you want to use the aforementioned local scratch space on the compute node.

Available Space

Please take in mind, that the free space on /localscratch/${SLURM_JOB_ID}/ when the jobs starts, might be shared with other users. If you need the total space to be available to you for the whole job, you should request the whole node, for example by allocating all CPUs.

Copy files via job script

This methods requires you to wrap your program in a small shell script like this:

job.sh
#!/bin/bash
 
# Store working directory to be safe
SAVEDPWD=$(pwd)
 
# We define a bash function to do the cleaning when the signal is caught
cleanup(){
    cp /localscratch/${SLURM_JOB_ID}/output_file ${SAVEDPWD}/ &
    cp /localscratch/${SLURM_JOB_ID}/restart_file ${SAVEDPWD}/ &
    wait
    exit 0
}
 
# Register the cleanup function when SIGUSR2 is sent,
# ten minutes before the job gets killed
trap 'cleanup' SIGUSR2
 
# Copy input file
cp ${SAVEDPWD}/input_file /localscratch/${SLURM_JOB_ID}
cp ${SAVEDPWD}/restart_file /localscratch/${SLURM_JOB_ID}
 
# Go to jobdir and start the program
cd /localscratch/${SLURM_JOB_ID}
${SAVEDPWD}/my_program 
 
# Call the cleanup function when everything went fine
cleanup

This script might be submitted with

$ chmod +x ./job.sh
$ sbatch -N1 -p nodeshort ./job.sh

Sending signals to jobs withing SLURM

In the case where your application does not write to the job directory during the run, but upon ending, you need to adapt the snippet above to accommodate for this: The application needs to be signaled and needs time to write its output before copying the files back onto the parallel file system:

job_syncronous.sh
#!/bin/bash
 
 
#SBATCH -A <your slurm account>
#SBATCH -p nodeshort # on Mogon I
#SBATCH -p parallel  # on Mogon II
#SBATCH -t <appropriate time>
#SBATCH --signal=B:SIGUSR2@600 # e.g. signal 10 minutes before the job will end
 
# Store working directory to be safe
SAVEDPWD=$(pwd)
 
# we need to save the PIDs generated by this script, whether it is starting several
# applications or only one:
QUEUE=""
 
# We define a bash function to do the cleaning when the signal is caught
cleanup(){
    # ATTENTION: If your application will write its output upon receiving a specific signal
    # this very signal need to be send to the application with 'kill -s <signal>'. The -0 flag
    # ensures that there is no error message, if the PID in question cannot be found.
 
    # As several commands can be started within this script, we need to cycle over all PIDs.
    for PID in $QUEUE; do
        kill -0 $PID
    done
 
    # Subsequently, you give the program some time to exit gracefully and write the output.
    # Adjust or skip this delay, if necessary or appropriate.
    sleep 30
 
    cp /localscratch/${SLURM_JOB_ID}/output_file ${SAVEDPWD}/ &
    cp /localscratch/${SLURM_JOB_ID}/restart_file ${SAVEDPWD}/ &
    wait # wait to ensure both concurrent copy processes are done
    exit 0
}
# Register the cleanup function when SIGUSR2 is sent,
# ten minutes before the job gets killed
trap 'cleanup' SIGUSR2
 
# Copy input file
cp ${SAVEDPWD}/input_file /localscratch/${SLURM_JOB_ID}/
cp ${SAVEDPWD}/restart_file /localscratch/${SLURM_JOB_ID}/
 
# Go to jobdir and start the program
cd /localscratch/${SLURM_JOB_ID}
${SAVEDPWD}/my_program &
 
# We save our programs PID for later use (for the use case, see the cleanup()-function).
# Storing the PID from the last command (started with '&' and known in bash as '$!') needs to be
# done for every application start, e.g. in a loop.
QUEUE="QUEUE $!"
 
# Call the cleanup function when everything went fine
cleanup 
$ chmod +x ./job.sh
$ sbatch  ./job.sh

Signalling in SLURM -- difference between signalling submission scripts and applications

In SLURM applications do not automatically get a signal, before hitting the walltime. It needs to be specified:

$ sbatch --signal=SIGUSR2@600 ...

This would send the signal SIGUSR2 to the application ten minutes before hitting the walltime of the job. Note that the slurm documentation states that there is a uncertainty of up to 1 minute.

Usually this requires you to use

$ sbatch --signal=B:SIGUSR2@600 ...

or rather

#SBATCH --signal=B:SIGUSR2@600 ...

withing a submission script to signal the batch-job (instead of all the children of but not the batch job itselft). The reason is: If using a submission script like the one above, you trap the signal within the script, not the application. In case an application accepts a specific signal and you want to use this functionality, you can send the signal from within the script:

# list of process IDs (PIDs) to signal
QUEUE=""
 
function queue {
  QUEUE="$QUEUE $1"
}
 
function forward_signal() {
  # this function might fulfill additional purposes, like
  # forwarding the signal, waiting a checkpoint to be written
  # and then copying the last checkpoint back to the parallel file system
 
  # just send the desired signal, e.g. SIGUSR2
  kill -s SIGUSR2 $1
}
 
# trap the signal within the bash script
# it is possible to connect several functions with a signal
trap 'forward_signal' SIGUSR2
 
# start the desired application(s) - note the &
eval "my command and its parameters &"
# store the PID of the desired application(s)
queue $! 
# The sequence above needs to be carried out for every application instance
# you want to be signalled.

Copy files to multiple nodes via job script

The following script can be used to ensure that input files are present in the job directory on all nodes.
This is required for e.g. NAMD2, which in some cases reads input data on other nodes than the starting node.

This script is very verbose, you might want to delete or comment out the echo lines.

Also, this script copies data from all nodes back into separate directories named ${SAVEDPWD}/${SLURM_JOB_NODELIST}/${host}. \\If your application only needs to read on every node but does not write on every node, you want to use the cleanup function from the script posted above. The demonstrated sbcast command can also be used for the one-node example above.

job_multinode.sh
#!/bin/bash
 
#SBATCH -N 2 # assuming mogon I 'bulldozer' := 64 cores
#SBATCH -J 'namd2_128'
#SBATCH -p nodeshort
#SBATCH -mem 1800M
 
JOBDIR="/localscratch/${SLURM_JOB_ID}"
HOSTLIST=$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s | tr ',' ' ')
echo $HOSTLIST
 
# Store working directory to be safe
SAVEDPWD=$(pwd)
 
# We define a bash function to do the cleaning when the signal is caught
cleanup() {
     sbcast ${JOBDIR}/resultfile ${SAVEDPWD}/${SLURM_JOB_ID}/resultfile
     exit 0
}
 
# Register the cleanup function when SIGUSR2 is sent,
# ten minutes before the job gets killed
trap 'cleanup' SIGUSR2
 
# copy the input file on all nodes
sbcast ${HOME}/NAMD_2.9/apoa1.namd $JOBDIR/apoa1.namd
# some applications only need the file on the 'master' node
# in this case you can restrict yourself to:
cp ${HOME}/NAMD_2.9/apoa1.namd $JOBDIR/apoa1.namd
 
# Go to jobdir and start the program
cd "${JOBDIR}"
 
$@ "${JOBDIR}/apoa1.namd"
 
# Call the cleanup function when everything went fine
cleanup

This script is used as follows:

$ chmod +x ./job_multinode.sh
$ namd2 # after loading the appropriate module
slurm_localscratch.txt · Last modified: 2018/04/18 13:07 by noskov