job_arrays

This is an old revision of the document!


Job Arrays

According to the Slurm Job Array Documentation, “job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily.” In general, job arrays are useful for applying the same processing routine to a collection of multiple input data files. Job arrays offer a very simple way to submit a large number of independent processing jobs.

We strongly recommend, using parallel processing in addition and to use job arrays as a convenience feature, without neglecting performance optimization.

By submitting a single job array sbatch script, a specified number of “array-tasks” will be created based on this “master” sbatch script. An example job array script is given below:

#!/bin/bash
 
#SBATCH --job-name=arrayJob
#SBATCH --output=arrayJob_%A_%a.out # redirecting stdout
#SBATCH --error=arrayJob_%A_%a.err  # redirecting stderr
#SBATCH --array=1-16 
#SBATCH --time=01:00:00
#SBATCH --partition=short # for mogon I
#SBATCH --ntasks=1        # number of tasks per array job
#SBATCH --mem-per-cpu=4000
 
 
######################
# Begin work section #
######################
 
# Print this sub-job's task ID
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
 
# Do some work based on the SLURM_ARRAY_TASK_ID
# For example: 
# ./my_process $SLURM_ARRAY_TASK_ID
# 
# where my_process is you executable

In the above example, The --array=1-16 option will cause 16 array-tasks (numbered 1, 2, …, 16) to be spawned when this master job script is submitted. The “array-tasks” are simply copies of this master script that are automatically submitted to the scheduler on your behalf. However, in each array-tasks an environment variable called SLURM_ARRAY_TASK_ID will be set to a unique value (in this example, a number in the range 1, 2, …, 16). In your script, you can use this value to select, for example, a specific data file that each array-tasks will be responsible for processing.

Job array indices can be specified in a number of ways. For example:

#A job array with index values between 0 and 31:
#SBATCH --array=0-31
 
#A job array with index values of 1, 2, 5, 19, 27:
#SBATCH --array=1,2,5,19,27
 
#A job array with index values between 1 and 7 with a step size of 2 (i.e. 1, 3, 5, 7):
#SBATCH --array=1-7:2

The %A_%a construct in the output and error file names is used to generate unique output and error files based on the master job ID (%A) and the array-tasks ID (%a). In this fashion, each array-tasks will be able to write to its own output and error file.

The –multi-prog option in srun allows you to assign each parallel task in your job a different option.

Create your submission script with the basic details. For example call it job.sh

#!/bin/sh
#SBATCH -n 16           # 16 cores
#SBATCH -t 1-03:00:00   # 1 day and 3 hours
#SBATCH -p compute      # partition name
#SBATCH -J my_job_name  # sensible name for the job
 
 
srun --multi-prog test.config

The file test.config contains the parameters required by the –multi-prog option.

The configuration file contains three fields, separated by blanks. These fields are :

  • Task number
  • Executable File
  • Argument

Parameters available :

  • %t - The task number of the responsible task
  • %o - The task offset (task's relative position in the task range).

An example configuration file :

0-3             hostname        
4,5             echo            task:%t
6               echo            task:%t-%o
7               echo            task:%o
8-15            hostname        

Please note that if you're using a custom executable, you should supply the full PATH to the file.

For example:

0   /home/users/jbloggs/bin/my_bin input1
1   /home/users/jbloggs/bin/my_bin input2
...

The module tools/staskfarm/0.1 provides a shortcut for writing more cumbersome configuration files. The script comes with a help message:

$ staskfarm -h

And it can be used within a slurm environment, e.g.:

#SBATCH -N 2
#SBATCH ...
#SBATCH --ntasks=x
 
# preliminary tasks go here
 
module load tools/staskfarm/0.1
 
taskfarm <command> [filename(s)]

Note that –ntasks needs to be specified with a number x lower or equal the number of processors supplied by N divided by the number of threads per task.

The code is hosted at github, where a “nicer” readme-layout resides.

  • job_arrays.1499199259.txt.gz
  • Last modified: 2017/07/04 22:14
  • by meesters