User Tools

Site Tools


submit_and_manage_jobs_on_mogon_quickstart

This is an old revision of the document!


Submitting Jobs

salloc, srun, and sbatch are user to reserve job slots, run tasks interactively or submit a job to SLURM.

Basically, most scripts are submitted using sbatch:

$ sbatch <scriptfile>

where the scriptfile holds several special options preceded with #SBATCH, e.g.

#!/bin/bash # or could be another language
 
#SBATCH -J my_job_name 
 
# etc.

Detailed information on important options follow below.


An extensive documentation on the salloc command can be found in the SLURM documentation: "<a href="https://slurm.schedmd.com/salloc.html">man salloc</a>".
An extensive documentation on the srun command can be found in the SLURM documentation: "<a href="https://slurm.schedmd.com/srun.html">man srun</a>".
An extensive documentation on the sbatch command can be found in the SLURM documentation: "<a href="https://slurm.schedmd.com/sbatch.html">man sbatch</a>".

Important parameters

There are some important parameters that are always required or at least recommended to use:

Mandatory Without these your jobs are rejected
-A <account> The account your job should be accounted for.
-p <partition> The partition your job should run in.
Available partitions.
Highly Recommended Necessary. Lack of Options might lead to default settings and/or unexpected allocations
-n <tasks> Controls the number of tasks to be created for the job (=cores, if no advanced topology is given).
-N <nodes> The number of nodes you need
-t <minutes>
-t <hour>:<min>
Set the runtime limit of your job (up to the maximum allowed by the selected partition). See the description for host model considerations.
-J <jobname> Sets an arbitrary name for your job that is used for Email notifications and listing of jobs through.
Optional
--task-per-node Controls the maximum number of tasks per allocated node.
--cpus-per-task No. of cpus per task
--mem The amount of memory per Node. Different units can be specified using the suffix [K|M|G|T].
--mem-per-cpu Amount of memory per CPU. See above.

To specify files for output, error and input consider the following:

-o <filename> Will direct stdout, stderr into one file.
-o <filename>.log -e <filename>.err Will direct stdout to the log file and stderr to the error log file.
-i <filename> Instruct Slurm to connect the batch script's standard input directly to the file name specified.

You may use one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j).
For example, job%4j.out yields job0128.out

%A Job array's master job allocation number.
%a Job array ID (index) number.
%J jobid.stepid of the running job. (e.g. “128.0”)
%j jobid of the running job.
%s stepid of the running job.
%u User name.
%x Job name.

Other important parameters / features on mogon include:

Once a job has been submitted you can get information on it or control with this list of commands.

Interactive Jobs

In order to test or visualize it is sometimes handy to allocate resources for an interactive job. SLURM provides two commands to facilitate this: srun and salloc.

Please note, that our policies forbid using login nodes for prolonged interactive work, which may inhibit the workflow of others. You can use interactive reservations as described in the following paragraphs, instead:

Simple Interactive Work with ''srun''

To get an interactive shell you can just run:

$ srun --pty -p <partition name> -A <account name> bash -i

You can reserve more time, memory or cpus as well.

Allocation with ''salloc''

To quote the official documentation: salloc is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc successfully obtains the requested allocation, it then runs the command specified by the user.

An example:

$ salloc -N 2 -p nodeshort -A zdvhpc
salloc: Granted job allocation 3242
salloc: Waiting for resource configuration
salloc: Nodes a[0001-0002] are ready for job
$ # now you can use two nodes and start the desired application
$ 
$ # do not forget to type 'exit' or else you will be working in a subshell
$ exit

Using sbatch

You have to prepare a job script to submit jobs using the sbatch command.

You can pass options to sbatch directly on the command-line or specify them in the job script file. Slurm will reject Jobs that do not set -A (account) and -p (partition).

To submit your job use

login21$ sbatch myjobscript

Trivial example - full node job

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on Mogon.
#
# This script requests one core (out of 64) on one node. The job
# will have access to all the memory in the node.  Note that this
# job will be charged as if all 64 cores were requested.
#-----------------------------------------------------------------
 
#SBATCH -J mysimplejob           # Job name
#SBATCH -o mysimplejob.%j.out    # Specify stdout output file (%j expands to jobId)
#SBATCH -p nodeshort             # Queue name
#SBATCH -N 1                     # Total number of nodes requested (16 cores/node)
#SBATCH -n 1                     # Total number of tasks
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
 
#SBATCH -A account               # Specify allocation to charge against
 
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load gcc/6.3.0
 
# Launch the executable
<myexecutable>

Simple MPI Job

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run MPI Job on Mogon.
#
# This script requests 128 cores on two node. The job
# will have access to all the memory in the nodes.  
#-----------------------------------------------------------------
 
#SBATCH -J mympijob              # Job name
#SBATCH -o mympijob.%j.out       # Specify stdout output file (%j expands to jobId)
#SBATCH -p nodeshort             # Queue name
#SBATCH -N 2                     # Total number of nodes requested (64 cores/node)
#SBATCH -n 128                   # Total number of tasks
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
 
#SBATCH -A account               # Specify allocation to charge against
 
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load gcc/6.3.0
module load mpi/intelmpi/2017
 
# Launch the executable
srun -n 128 <myexecutable>
submit_and_manage_jobs_on_mogon_quickstart.1510658666.txt.gz · Last modified: 2017/11/14 12:24 by nietocp1