This is an old revision of the document!
Submitting Jobs
With SLURM there are three commands to reserve resource allocaction, resp. to submit jobs: salloc
, srun
and sbatch
. They are used to submit jobs (sbatch
), to reserve allocation for interactive tasks (salloc
) and to run so-called job-steps (see below) or small interactive jobs (srun
).
Basically, most scripts are submitted using sbatch
:
$ sbatch <scriptfile>
where the scriptfile holds several special options preceded with #SBATCH
, e.g.
#!/bin/bash # or could be another language #SBATCH -J my_job_name # etc.
Detailed information on important options follow below.
An extensive documentation on the salloc
, srun
and sbatch
commands can be found in the SLURM documentation: salloc, srun, sbatch, or the man pages for each command, e.g $ man sbatch
.
-A
(account) and -p
(partition).
See Accounts and Accounting for details.
Important parameters
There are some important parameters that are always required or at least recommended to use:
Mandatory | Without these your jobs are rejected |
-A <account> | The project account (not your user id) your job should be accounted for. Use sacctmgr -s list user $USER format=user%10,account%20 to retrieve the associated accounts. See Accounts and Accounting for details. |
-p <partition> | The partition your job should run in. Available partitions. |
Necessary | Lack of Options might lead to default settings and/or unexpected allocations |
-n <tasks> | Controls the number of tasks to be created for the job (=cores, if no advanced topology is given). |
-N <nodes> | The number of nodes you need |
-t <minutes> -t <hour>:<min>:<sec> | Set the runtime limit of your job (up to the maximum allowed by the selected partition). See for host model considerations. |
-J <jobname> | Sets an arbitrary name for your job that is used for listing of jobs. |
Optional | |
--task-per-node | Controls the maximum number of tasks per allocated node. |
--cpus-per-task | No. of cpus per task |
--mem | The amount of memory per Node. Different units can be specified using the suffix [K|M|G|T] (default is 'M' for MegaByte). See the Memory reservation page for details and hints, particularly with respect to partition default memory settings. |
--mem-per-cpu | Amount of memory per CPU. See above for the units. |
To specify files for output, error and input consider the following:
-o <filename> | Will direct stdout, stderr into one file. 1) |
-o <filename>.log -e <filename>.err | Will direct stdout to the log file and stderr to the error log file. |
-i <filename> | Instruct Slurm to connect the batch script's standard input directly to the file name specified. |
Specifying Runtime
Requesting runtime is straightforward: The -t
or –time
flag can be used in srun
/salloc
and sbatch
alike:
$ srun --time <time reservation>
or within a script
#SBATCH -t <time reservation>
where <time reservation>
can be any of the acceptable time formats minutes
, minutes:seconds
, hours:minutes:seconds
, days-hours
, days-hours:minutes
and days-hours:minutes:seconds
.
Time resolution is one minute and second values are rounded up to the next minute. A time limit of zero requests that no time limit be imposed, meaning that the maximum runtime of the partitions will be used.
Signals
Slurm does not send signals if not requested. However, there are situations when you may like to trigger a signal (e.g. in some IO-workflows). You can request a specific signal with –signal
either to srun
or sbatch
from within a script. The flag can be used like –signal=<sig_num>[@<sig_time>]
: When a job is within sig_time
seconds of its end time, then the signal sig_num
is sended. If a sig_num is specified without any sig_time, the default time will 60 seconds. Due to the resolution of event handling by Slurm, the signal may be sent up to 60 seconds earlier than specified.
An example would be
$ sbatch --signal=SIGUSR2@600 ...
or within a script
#SBATCH --signal=SIGUSR2@600
here, the signal SIGUSR2
is send to the application ten minutes before hitting the walltime of the job. Note once more that the slurm documentation states that there is a uncertainty of up to 1 minute.
CPU Architecture
skylake
or broadwell
for the Skylake and Broadwell nodes, respectively. If the architecture is not relevant for your application, select anyarch
.
This can be set with:
-C <selection list>
or –constrain=<selection list>
to sbatch
(on the command line or within a jobscript).
The defaults are2)
broadwell
in the ''parallel'' partitionskylake
on thehimster2
cluster (only applicable for HIM employees)
If nothing is specified you'll get broadwell
except for the himster2
-partition where it's going to be skylake
. On the bigmem
-partition it will depend on your requested memory per node.
Receiving mail notifications
--mail-type= | Specifies which types of mails a user wants to receive. Can be any of NONE , BEGIN , END , FAIL , REQUEUE or ALL |
--mail-user=<username>@uni-mainz.de | Specifies the receiving mail address. We highly recommend taking an internal address rather relying on an a third party service. |
You may use one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j).
For example, job%4j.out
yields job0128.out
%A | Job array's master job allocation number. |
%a | Job array ID (index) number. |
%J | jobid.stepid of the running job. (e.g. “128.0”) |
%j | jobid of the running job. |
%s | stepid of the running job. |
%u | User name. |
%x | Job name. |
Other important parameters / features on mogon include:
- Using the ramdisk
- Using local scratch space
- Specifying runtimes in accordance with
- Using GPU Queues
Interactive Jobs
In order to test or visualize it is sometimes handy to allocate resources for an interactive job. SLURM provides two commands to facilitate this: srun
and salloc
.
Please note, that our policies forbid using login nodes for prolonged interactive work, which may inhibit the workflow of others. You can use interactive reservations as described in the following paragraphs, instead:
Simple Interactive Work with ''srun''
To get an interactive shell you can just run:
$ srun --pty -p <partition name> -A <account name> bash -i
You can reserve more time, memory or cpus as well.
Allocation with ''salloc''
To quote the official documentation: salloc
is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc
successfully obtains the requested allocation, it then runs the command specified by the user.
An example:
$ salloc -N 2 -p parallel-A zdvhpc salloc: Granted job allocation 3242 salloc: Waiting for resource configuration salloc: Nodes z[0001-0002] are ready for job $ # now you can use two nodes and start the desired application $ # e.g. $ srun -N1 [other parameters] <some application confined on one node> $ srun [other parameters] <some application triggered on all nodes> $ srun [other parameters] <some mpi application> $ # do not forget to type 'exit' or else you will be working in a subshell $ exit
During a session with salloc you may login to the allocated nodes (with ssh
) and monitor their behaviour. This can be handy to estimate memory usage, too.
Using sbatch
You have to prepare a job script to submit jobs using sbatch (for interactive jobs see srun or salloc).
You can pass options to sbatch directly on the command-line or specify them in the job script file.
To submit your job use
$ sbatch myjobscript
Trivial example - single core job
- myjobscript.slurm
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run single core applications on # MOGON. # # This script requests one core (out of 20) on one Broadwell-node. The job # will have access to the default memory of the partition. #----------------------------------------------------------------- #SBATCH -J mysimplejob # Job name #SBATCH -o mysimplejob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p smp # Queue name 'smp' or 'parallel' on Mogon II #SBATCH -n 1 # Total number of tasks, here explicitly 1 #SBATCH --mem 300M # The default is 300M memory per job. You'll likely have to adapt this to your needs #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH -A account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. # Launch the executable srun <myexecutable>
Trivial example - full node job - threaded application
In contrast to the previous example, the following will launch one task on 20 cores. Be careful: Most applications do not scale that far.
- mysmpjobscript.slurm
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run serial applications on MOGON. # # This script requests one core (out of 20) on one Broadwell-node. # The job will have access to all the memory in the node. Note # that this job will be charged as if all 20 cores were requested. #----------------------------------------------------------------- #SBATCH -J mysimplejob # Job name #SBATCH -o mysimplejob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p parallel # Queue name #SBATCH -N 1 # Total number of nodes requested (20 cores/node on a standard Broadwell-node) #SBATCH -n 1 # Total number of tasks #SBATCH -c 20 # Total number of cores for the single task #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH -A account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load gcc/6.3.0 # Launch the executable with one task distributed on 20 cores: srun <myexecutable>
Simple MPI Job
- myjobscript.slurm
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run MPI Job on MOGON. # # This script requests 80 MPI-tasks on two Broadwell-nodes. The job # will have access to all the memory in the nodes. #----------------------------------------------------------------- #SBATCH -J mympijob # Job name #SBATCH -o mympijob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p parallel # Partition/Queue name #SBATCH -N 2 # Total number of nodes requested (40 tasks/node) #SBATCH -n 80 # Total number of tasks #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH -A <account> # Specify account to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load <appropriate module(s)> # Launch the executable srun <myexecutable>
Hybrid MPI-OpenMP Jobs (using GROMACS as an example)
Whereas MPI applications frequently adhere to the standard MPI idea of parallelization by multiprocessing and exchanging messages between the created processes, hybrid applications use internally threaded processes (e.g. either by means of MPI-tasks or OpenMP-threads).
For this example we assume you want to run GROMACS on 2 Skylake-nodes (have 32 cores per node) with 32 MPI-tasks run on 2 cores each and 2 OpenMP-threads per MPI-task. The job script could look something like this
- myjobscript.slurm
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run GROMACS with MPI on MOGON. # # This script requests 64 cores on two nodes. The job # will have access to all the memory in the nodes. #----------------------------------------------------------------- #SBATCH -J mygromacsjob # Job name #SBATCH -o mygromacsjob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p parallel # Partition/Queue name #SBATCH -C skylake # select 'skylake' architecture #SBATCH -N 2 # Total number of nodes requested (32 cores/node) #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH -A <account> # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load bio/GROMACS # you can select a specific version, too # Launch the executable srun -n 32 -c 2 gmx_mpi mdrun -ntomp 2 -deffnm em