job_arrays

This is an old revision of the document!


Job Arrays

According to the Slurm Job Array Documentation, “job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily.” In general, job arrays are useful for applying the same processing routine to a collection of multiple input data files. Job arrays offer a very simple way to submit a large number of independent processing jobs.

We strongly recommend, using parallel processing in addition and to use job arrays as a convenience feature, without neglecting performance optimization.

By submitting a single job array sbatch script, a specified number of “array-tasks” will be created based on this “master” sbatch script. An example job array script is given below:

#!/bin/bash
 
#SBATCH --job-name=arrayJob
#SBATCH --output=arrayJob_%A_%a.out # redirecting stdout
#SBATCH --error=arrayJob_%A_%a.err  # redirecting stderr
#SBATCH --array=1-16 
#SBATCH --time=01:00:00
#SBATCH --partition=short # for mogon I
#SBATCH --ntasks=1        # number of tasks per array job
#SBATCH --mem-per-cpu=4000
 
 
######################
# Begin work section #
######################
 
# Print this sub-job's task ID
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
 
# Do some work based on the SLURM_ARRAY_TASK_ID
# For example: 
# ./my_process $SLURM_ARRAY_TASK_ID
# 
# where my_process is you executable

In the above example, The --array=1-16 option will cause 16 array-tasks (numbered 1, 2, …, 16) to be spawned when this master job script is submitted. The “array-tasks” are simply copies of this master script that are automatically submitted to the scheduler on your behalf. However, in each array-tasks an environment variable called SLURM_ARRAY_TASK_ID will be set to a unique value (in this example, a number in the range 1, 2, …, 16). In your script, you can use this value to select, for example, a specific data file that each array-tasks will be responsible for processing.

Job array indices can be specified in a number of ways. For example:

#A job array with index values between 0 and 31:
#SBATCH --array=0-31
 
#A job array with index values of 1, 2, 5, 19, 27:
#SBATCH --array=1,2,5,19,27
 
#A job array with index values between 1 and 7 with a step size of 2 (i.e. 1, 3, 5, 7):
#SBATCH --array=1-7:2

The %A_%a construct in the output and error file names is used to generate unique output and error files based on the master job ID (%A) and the array-tasks ID (%a). In this fashion, each array-tasks will be able to write to its own output and error file.

It is possible to limit the number of concurrently executed jobs of an array, e.g. to minimize I/O overhead within one approach, with this syntax:

#SBATCH --array=1-1000%50

where a limit of 50 concurrent jobs would be in place.

The --multi-prog option in srun allows you to assign each parallel task in your job with a different option. More information can be found at our wiki page on node-local scheduling.

  • job_arrays.1520886744.txt.gz
  • Last modified: 2018/03/12 21:32
  • by meesters