submit_and_manage_jobs_on_mogon_quickstart

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
submit_and_manage_jobs_on_mogon_quickstart [2021/09/14 14:48]
jrutte02 removed
— (current)
Line 1: Line 1:
-====== Submitting Jobs ====== 
- 
-''salloc'', ''srun'', and ''sbatch'' are user to reserve job slots, run tasks interactively or submit a job to SLURM. 
- 
-Basically, most scripts are submitted using ''sbatch'': 
-<code bash> 
-$ sbatch <scriptfile> 
-</code> 
-where the scriptfile holds several special options preceded with ''#SBATCH'', e.g.  
-<code bash> 
-#!/bin/bash # or could be another language 
- 
-#SBATCH -J my_job_name  
- 
-# etc. 
-</code> 
- 
-Detailed information on important options follow below. 
- 
----- 
- 
-An extensive documentation on the ''salloc'', ''srun'' and ''sbatch'' command can be found in the SLURM documentation: [[https://slurm.schedmd.com/salloc.html|salloc]], [[https://slurm.schedmd.com/srun.html|srun]], [[https://slurm.schedmd.com/sbatch.html|sbatch]]. 
- 
-===== Important parameters ===== 
- 
-There are some important parameters that are always required or at least recommended to use: 
- 
-| **Mandatory** | Without these your jobs are rejected | 
-| ''-A <account>'' | The account your job should be accounted for. | 
-| ''-p <partition>'' | The partition your job should run in.\\ [[partitions|Available partitions]]. | 
-| **Highly Recommended** | Necessary. Lack of Options might lead to default settings and/or unexpected allocations | 
-| ''-n <tasks>'' | Controls the number of tasks to be created for the job (=cores, if no advanced topology is given). | 
-| ''-N <nodes>'' | The number of nodes you need | 
-| ''-t <minutes>'' \\ ''-t <hour>:<min>'' | Set the runtime limit of your job (up to the maximum allowed by the selected [[partitions|partition]]). See [[runtime|the description]] for host model considerations.| 
-| ''-J <jobname>'' | Sets an arbitrary name for your job that is used for Email notifications and listing of jobs through. | 
-| ** Optional ** | | 
-| ''%%--%%task-per-node'' | Controls the maximum number of tasks per allocated node. | 
-| ''%%--%%cpus-per-task'' | No. of cpus per task| 
-| ''%%--%%mem'' | The amount of memory per Node. Different units can be specified using the suffix ''[K|M|G|T]''.| 
-| ''%%--%%mem-per-cpu'' | Amount of memory per CPU. See above.| 
- 
-To specify files for output, error and input consider the following: 
-| ''-o <filename>'' | Will direct stdout, stderr into one file. | 
-| ''-o <filename>.log -e <filename>.err'' | Will direct stdout to the log file and stderr to the error log file.| 
-| ''-i <filename>'' | Instruct Slurm to connect the batch script's standard input directly to the file name specified. | 
- 
-You may use one or more replacement symbols, which are a percent sign "%" followed by a letter (e.g. %j).\\ 
-For example, ''job%4j.out'' yields ''job0128.out'' 
-  
-| %A | Job array's master job allocation number.| 
-| %a | Job array ID (index) number. | 
-| %J | jobid.stepid of the running job. (e.g. "128.0") | 
-| %j | jobid of the running job. | 
-| %s | stepid of the running job. | 
-| %u | User name. | 
-| %x | Job name. | 
- 
-Other important parameters / features on mogon include: 
- 
-  * Submitting [[job_arrays|job arrays]] and specific considerations 
-  * Using the [[ramdisk|ramdisk]] 
-  * Using [[local_scratch|Local scratch space]] 
-  * Specifying runtimes [[runtime|runtime]] 
-  * Using [[lsf_gpu|GPU Queues]] 
- 
-<WRAP center round info 80%> 
-Once a job has been submitted you can get information on it or control with this [[job_control_and_information|list of commands]]. 
-</WRAP> 
- 
-===== Interactive Jobs ===== 
- 
-In order to test or visualize it is sometimes handy to allocate resources for an interactive job. SLURM provides two commands to facilitate this: ''srun'' and ''salloc''. 
- 
-Please note, that our policies forbid using login nodes for [[policies_loginnodes|prolonged interactive work]], which may inhibit the workflow of others. You can use interactive reservations as described in the following paragraphs, instead: 
- 
-==== Simple Interactive Work with ''srun'' ==== 
- 
-To get an interactive shell you can just run: 
- 
-<code bash> 
-$ srun --pty -p <partition name> -A <account name> bash -i 
-</code> 
- 
-You can reserve more time, memory or cpus as well. 
- 
-==== Allocation with ''salloc'' ==== 
- 
-To quote the official documentation: ''salloc'' is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc successfully obtains the requested allocation, it then runs the command specified by the user.  
- 
-An example: 
- 
-<code bash> 
-$ salloc -N 2 -p nodeshort -A zdvhpc 
-salloc: Granted job allocation 3242 
-salloc: Waiting for resource configuration 
-salloc: Nodes a[0001-0002] are ready for job 
-$ # now you can use two nodes and start the desired application 
- 
-$ # do not forget to type 'exit' or else you will be working in a subshell 
-$ exit 
-</code> 
- 
-===== Using sbatch ===== 
- 
-You have to prepare a job script to submit jobs using the ''sbatch'' command. 
- 
-You can pass options to ''sbatch'' directly on the command-line or specify them in the job script file. Slurm will reject Jobs that do not set ''-A'' (account) and ''-p'' (partition).  
- 
-To submit your job use 
- 
-<code> 
-login21$ sbatch myjobscript 
-</code> 
- 
-==== Trivial example - full node job ====  
- 
-<file myjobscript> 
-#!/bin/bash 
-#----------------------------------------------------------------- 
-# Example SLURM job script to run serial applications on Mogon. 
-# 
-# This script requests one core (out of 64) on one node. The job 
-# will have access to all the memory in the node.  Note that this 
-# job will be charged as if all 64 cores were requested. 
-#----------------------------------------------------------------- 
- 
-#SBATCH -J mysimplejob           # Job name 
-#SBATCH -o mysimplejob.%j.out    # Specify stdout output file (%j expands to jobId) 
-#SBATCH -p nodeshort             # Queue name 
-#SBATCH -N 1                     # Total number of nodes requested (16 cores/node) 
-#SBATCH -n 1                     # Total number of tasks 
-#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours 
- 
-#SBATCH -A account               # Specify allocation to charge against 
- 
-# Load all necessary modules if needed (these are examples) 
-# Loading modules in the script ensures a consistent environment. 
-module load gcc/6.3.0 
- 
-# Launch the executable 
-<myexecutable> 
-</file> 
- 
- 
-==== Simple MPI Job ====  
- 
-<file myjobscript> 
-#!/bin/bash 
-#----------------------------------------------------------------- 
-# Example SLURM job script to run MPI Job on Mogon. 
-# 
-# This script requests 128 cores on two node. The job 
-# will have access to all the memory in the nodes.   
-#----------------------------------------------------------------- 
- 
-#SBATCH -J mympijob              # Job name 
-#SBATCH -o mympijob.%j.out       # Specify stdout output file (%j expands to jobId) 
-#SBATCH -p nodeshort             # Queue name 
-#SBATCH -N 2                     # Total number of nodes requested (64 cores/node) 
-#SBATCH -n 128                   # Total number of tasks 
-#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours 
- 
-#SBATCH -A account               # Specify allocation to charge against 
- 
-# Load all necessary modules if needed (these are examples) 
-# Loading modules in the script ensures a consistent environment. 
-module load gcc/6.3.0 
-module load mpi/intelmpi/2017 
- 
-# Launch the executable 
-srun -n 128 <myexecutable> 
-</file>