Slurm Options

With SLURM there are three commands to reserve resource allocaction, resp. to submit jobs:

salloc: to reserve allocations for interactive tasks
srun to run so-called job steps or small interactive jobs
sbatch: to submit jobs to a queue for processing

An extensive documentation on the salloc, srun and sbatch commands can be found in the SLURM documentation: salloc, srun, sbatch, or the man pages for each command, e.g $ man sbatch.

The most commonly used parameters for these commands are listed below. Detailed information on important options can also be found in separate articles.

Parameter List

Option	Description
`-A` `--account`	The project account that is billed for your job. For example: `-A m2_zdvhpc` `--account=hpckurs` Mandatory. Looking for your account?
`-p` `--partition`	The partition your job should run in. For example: `-p parallel` `--partition=smp` Mandatory. Look up available partitions.
`-n` `--ntasks`	Controls the number of tasks to be created for the job (=cores, if no advanced topology is given). For example: `-n 4`
`-N --nodes`	The number of nodes you need. For example: `--nodes=2`
`-t` `--time`	Set the runtime limit of your job (within the partition constraints). For example to specify 1 hour: `-t 01:00:00` More details on the format here.
`-J` `--job-name`	Sets an arbitrary name for your job that is used for listing of jobs. Defaults to script name. For example: `--job-name=%x.%j.out`
`--task-per-node`	Controls the maximum number of tasks per allocated node.
`-c --cpus-per-task`	No. of CPUs per task
`-C --constraint`	Which processor architecture to use. For example: `-C broadwell` `--constraint=skylake` Read more about this constraint here.
`--mem`	The amount of memory per Node. Different units can be specified using `[K\|M\|G\|T]` (default is M for MegaByte). See the Memory reservation page for details and hints, particularly with respect to partition default memory settings.
`--mem-per-cpu`	Amount of memory per CPU. See above for the units.
`-o --output`	Will direct stdout, stderr into one file. (SLURM writes buffered. Shell based solution do not write buffered.)
`-o <filename>.log` `-e <filename>.err`	Will direct stdout to the log file and stderr to the error log file.
`-i <filename>`	Instruct Slurm to connect the batch script’s standard input directly to the file name specified.

You may use one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j). For example, job%4j.out yields job0128.out


`%A`	Job array’s master job allocation number.
`%a`	Job array ID (index) number.
`%J`	jobid.stepid of the running job. (e.g. “128.0”)
`%j`	jobid of the running job.
`%s`	stepid of the running job.
`%u`	User name.
`%x`	Job name.

Other important parameters / features on MOGON include:

Using the ramdisk
Using local scratch space
Specifying runtimes in accordance with host models or host names
Using GPU Queues

Once a job has been submitted you can get information on it or control with this list of commands.

CPU Architecure

On MOGON II a third important parameter is present:

You may select the CPU type to be either skylake or broadwell for the Skylake and Broadwell nodes, respectively. If the architecture is not relevant for your application, select anyarch.

This can be set with:

-C <selection list> or
--constraint=<selection list>

to sbatch (on the command line or within a jobscript).

The defaults are:

broadwell in the <em>parallel</em> partition
skylake on the himster2 cluster (only applicable for HIM employees)

If nothing is specified you’ll get broadwell except for the himster2 partition where it’s going to be skylake. On the bigmem partition it will depend on your requested memory per node.

You can get a list of features and resources of each node with:

sinfo -o "%32N %5c %10m %20f %15G"

You will get an output similar to:

NODELIST                         CPUS  MEMORY     AVAIL_FEATURES       GRES
s[0020,0023],z[0001-0838]        40+   57000+     anyarch,broadwell    (null)
x[0001-0814,0901-0902,2001-2320] 64    88500+     anyarch,skylake      (null)
s[0027-0030]                     48    115500     anyarch,broadwell    gpu:gtx1080ti:6
s[0001-0019,0021-0022,0024-0026] 48    115500     anyarch,broadwell    gpu:gtx1080ti:6
dgx01                            80    490000     anyarch,broadwell    gpu:V100_16g:8
dgx02                            80    490000     anyarch,broadwell    gpu:V100_32g:8

Specifying Runtime

Requesting runtime is straightforward: The -t or --time flag can be used in srun/salloc and sbatch alike:

srun --time <time reservation>

Or within a script

#SBATCH -t <time reservation>

where <time reservation> can be any of the acceptable time formats:

minutes,
minutes:seconds,
hours:minutes:seconds,
days-hours,
days-hours:minutes and
days-hours:minutes:seconds.

Time resolution is one minute and second values are rounded up to the next minute. A time limit of zero requests that no time limit is imposed, meaning that the maximum runtime of the partitions will be used.

Default Runtime

Most Nodes have a default runtime of 10 minutes after which they will be automatically killed unless more time is requested using the -t flag. The default runtime for a partition can be checked with

scontrol show partition <partition>

The Max wall time is the maximum requestable runtime on a node. Large jobs need to be split up and continued in a separate job.

Receiving mail notifications

Specify which types of mails you want to receive with:

--mail-type=<TYPE>

<TYPE> can be any of:

NONE,
BEGIN,
END,
FAIL,
REQUEUE,
STAGE_OUT (burst buffer stage out and teardown completed),
INVALID_DEPEND (dependency never satisfied) or
ALL (equivalent to BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT)

Specify the receiving mail address using:

--mail-user=<username>@uni-mainz.de

The default value is the submitting user. We highly recommend taking an internal address rather relying on an a third party service.

Signals

Slurm does not send signals if not requested. However, there are situations when you may like to trigger a signal (e.g. in some IO-workflows). You can request a specific signal with --signal either to srun or sbatch from within a script. The flag can be used like --signal=<sig_num>[@<sig_time>]: When a job is within sig_time seconds of its end time, then the signal sig_num is sent. If a sig_num is specified without any sig_time, the default time will $60 s$. Due to the resolution of event handling by Slurm, the signal may be sent up to $60 s$ earlier than specified.

An example would be:

sbatch --signal=SIGUSR2@600 ...

Or within a script:

#SBATCH --signal=SIGUSR2@600

Here, the signal SIGUSR2 is sent to the application ten minutes before hitting the walltime of the job. Note once more that the slurm documentation states that there is a uncertainty of up to $1 min$.

Cancel Jobs

Use the

scancel <jobid>

command with the jobid of the job you want to cancel.

In the case you want to cancel all your jobs, use -u, --user=:

scancel -u <username>

You can also restrict the operation to jobs in a certain state with -t, --state=

scancel -t <jobstate>

where <jobstate> can be:

PENDING
RUNNING
SUSPENDED

Using sbatch

You have to prepare a job script to submit jobs using sbatch. You can pass options to sbatch directly on the command-line or specify them in the job script file.

To submit your job use:

sbatch myjobscript

When does my Job start

A job is either started when it has the highest priority and the required resources are available, or when it has the opportunity to backfill. The following command gives an estimate of the time and date when your Job is supposed to start, but note that the estimate is based on the workload at current time:

squeue --start

Slurm cannot anticipate that higher priority jobs will be submitted after yours, or that machine downtime will result in fewer resources for jobs, or that job crashes will result in large jobs being started earlier than expected, causing smaller jobs that are scheduled for replenishment to lose that replenishment opportunity.

Slurm-based Job Monitoring

For running Jobs you can retrieve information on memory usage with sstat. Detailed information on which slots exactly your job is assigned to can be retrieved with the following command:

scontrol show -d job <jobid>

For completed Jobs, this Information is provided by sacct, e.g.:

sacct --format JobID,Jobname,NTasks,Nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize

For completed jobs, you can also use seff, which reports on the efficiency of a job’s CPU and memory utilisation.

seff <jobid>

Last modified by Jens Rutten on May 14, 2024

Slurm Options

Parameter List #

CPU Architecure #

Specifying Runtime #

Default Runtime #

Receiving mail notifications #

Signals #

Cancel Jobs #

Using sbatch #

When does my Job start #

Slurm-based Job Monitoring #