Submitting Jobs
salloc
, srun
, and sbatch
are user to reserve job slots, run tasks interactively or submit a job to SLURM.
Basically, most scripts are submitted using sbatch
:
$ sbatch <scriptfile>
where the scriptfile holds several special options preceded with #SBATCH
, e.g.
#!/bin/bash # or could be another language #SBATCH -J my_job_name # etc.
Detailed information on important options follow below.
An extensive documentation on the salloc
, srun
and sbatch
command can be found in the SLURM documentation: salloc, srun, sbatch.
Important parameters
There are some important parameters that are always required or at least recommended to use:
Mandatory | Without these your jobs are rejected |
-A <account> | The account your job should be accounted for. |
-p <partition> | The partition your job should run in. Available partitions. |
Highly Recommended | Necessary. Lack of Options might lead to default settings and/or unexpected allocations |
-n <tasks> | Controls the number of tasks to be created for the job (=cores, if no advanced topology is given). |
-N <nodes> | The number of nodes you need |
-t <minutes> -t <hour>:<min> | Set the runtime limit of your job (up to the maximum allowed by the selected partition). See the description for host model considerations. |
-J <jobname> | Sets an arbitrary name for your job that is used for Email notifications and listing of jobs through. |
Optional | |
--task-per-node | Controls the maximum number of tasks per allocated node. |
--cpus-per-task | No. of cpus per task |
--mem | The amount of memory per Node. Different units can be specified using the suffix [K|M|G|T] . |
--mem-per-cpu | Amount of memory per CPU. See above. |
To specify files for output, error and input consider the following:
-o <filename> | Will direct stdout, stderr into one file. |
-o <filename>.log -e <filename>.err | Will direct stdout to the log file and stderr to the error log file. |
-i <filename> | Instruct Slurm to connect the batch script's standard input directly to the file name specified. |
You may use one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j).
For example, job%4j.out
yields job0128.out
%A | Job array's master job allocation number. |
%a | Job array ID (index) number. |
%J | jobid.stepid of the running job. (e.g. “128.0”) |
%j | jobid of the running job. |
%s | stepid of the running job. |
%u | User name. |
%x | Job name. |
Other important parameters / features on mogon include:
- Submitting job arrays and specific considerations
- Using the ramdisk
- Using Local scratch space
- Specifying runtimes runtime
- Using GPU Queues
Once a job has been submitted you can get information on it or control with this list of commands.
Interactive Jobs
In order to test or visualize it is sometimes handy to allocate resources for an interactive job. SLURM provides two commands to facilitate this: srun
and salloc
.
Please note, that our policies forbid using login nodes for prolonged interactive work, which may inhibit the workflow of others. You can use interactive reservations as described in the following paragraphs, instead:
Simple Interactive Work with ''srun''
To get an interactive shell you can just run:
$ srun --pty -p <partition name> -A <account name> bash -i
You can reserve more time, memory or cpus as well.
Allocation with ''salloc''
To quote the official documentation: salloc
is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc successfully obtains the requested allocation, it then runs the command specified by the user.
An example:
$ salloc -N 2 -p nodeshort -A zdvhpc salloc: Granted job allocation 3242 salloc: Waiting for resource configuration salloc: Nodes a[0001-0002] are ready for job $ # now you can use two nodes and start the desired application $ $ # do not forget to type 'exit' or else you will be working in a subshell $ exit
Using sbatch
You have to prepare a job script to submit jobs using the sbatch
command.
You can pass options to sbatch
directly on the command-line or specify them in the job script file. Slurm will reject Jobs that do not set -A
(account) and -p
(partition).
To submit your job use
login21$ sbatch myjobscript
Trivial example - full node job
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run serial applications on Mogon. # # This script requests one core (out of 64) on one node. The job # will have access to all the memory in the node. Note that this # job will be charged as if all 64 cores were requested. #----------------------------------------------------------------- #SBATCH -J mysimplejob # Job name #SBATCH -o mysimplejob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p nodeshort # Queue name #SBATCH -N 1 # Total number of nodes requested (16 cores/node) #SBATCH -n 1 # Total number of tasks #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH -A account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load gcc/6.3.0 # Launch the executable <myexecutable>
Simple MPI Job
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run MPI Job on Mogon. # # This script requests 128 cores on two node. The job # will have access to all the memory in the nodes. #----------------------------------------------------------------- #SBATCH -J mympijob # Job name #SBATCH -o mympijob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p nodeshort # Queue name #SBATCH -N 2 # Total number of nodes requested (64 cores/node) #SBATCH -n 128 # Total number of tasks #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH -A account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load gcc/6.3.0 module load mpi/intelmpi/2017 # Launch the executable srun -n 128 <myexecutable>