User Tools

Site Tools


gpu

This is an old revision of the document!


GPU Queues

There is a number of different (public) partitions (SLURM lingo for 'queues') inside the cluster that support gpu usage:

Partition hosts GPUs RAM Access by
titan* i[0001-0009] 4 GeForce GTX Titan / node 48700 project on Mogon I
tesla* h[0001-0004] 4 Tesla K20m / node 49000 project on Mogon I
m2_gpu s[0001-0030] 6 GeForce GTX 1080 ti 11550 project on Mogon II

Physically all GPU nodes are placed together with Mogon I, hence users need to log in to Mogon I even to use the m2_gpu partition.

Notes:

  • RAM displays the default memory per node in MiB.
  • The Mogon I titan/tesla nodes come in as *short or *long queues, which is associated with maximum run times: 5 days and 5 hours, respectively.

The Mogon I titan/tesla nodes are not maintained any longer - the number of nodes is steadily declining. Eventually, they will be phased out.

Calculating on GPU nodes without using the accelerators / GPUs is prohibited. We reserve the right to disable an account upon abusing these resources (just using the slightly faster CPUs).

GPUs, after all, are a relatively costly resource.

Access

The accelerators (GPUs) of Mogon II are placed in the ZDV premise and are hence part of the Mogon I infrastructure. That is to say, you have to log in to Mogon I but use your Mogon II account (-A m2_*) to have access to those 189 GPUs in the m2_gpu partition. The tesla/titan partition are accessible by all accounts.

To get to know which account to use for the m2_gpu partition login to Mogon I and call

sacctmgr list user $USER -s where Partition=m2_gpu format=User%10,Account%20,Partition%10

All accounts that show Partition=m2_gpu can be used to submit jobs to the GPU-Partition.

Every group interested in using those GPUs, which does not have access already, can apply for it via the AHRP website.

Limitations

The m2_gpu is a single partition1) allowing a runtime of up to 5 days. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too. This may result in pending reasons such as QOSGrpGRESRunMinutes. For other pending reasons, see our page on job management.

Compiling for GPUs

Running Own or Self-Compiled Code

Unlike the login-nodes the s-nodes have Intel-CPUs, which means that you have to compile your code on the GPU-nodes otherwise you may end up with illegal instruction errors or similar.

There is a partition m2_gpu-compile which allows for running one job per user with maximum 8 cores, 1 cpu, and –mem=18000M for compiling your code. Maximum runtime for compile jobs is 60 minutes.

Submitting to the GPU-Partitions

To use a GPU you have to explicitly reserve it as a resource in the submission script:

#!/bin/bash
# ... other SBATCH statements
#SBATCH --gres=gpu:<number>
#SBATCH -p <appropriate partition>

Number can be anything from 1-6 on our GPU nodes, depending on the partition. In order to use more than 1 GPU the application needs to support using this much, of course.

–gres-flags=enforce-binding is currently not working properly in our Slurm-Version. You may try to use it with Multi-task GPU job but it won't work with Jobs reserving only part of a node. Schedmd seems to work on a bug fix.

Simple GPU-Job

Take a full GPU-node and run an executable that uses all 6 GPUs 2).

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on Mogon.
#
# This script requests one task using all cores (48) on one node. 
# The job will have access to all the memory and all 6 GPUs in the node.  
#-----------------------------------------------------------------
 
#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p m2_gpu                # Partition name
#SBATCH -N 1                     # Total number of nodes requested (48 cores/node per GPU node)
#SBATCH -n 1                     # Total number of tasks 
#SBATCH -c 48                    # CPUs per task 
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:6             # Reserve 6 GPUs 
 
#SBATCH -A m2_account            # Specify allocation to charge against
 
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA/9.1.85
 
# Launch the executable
srun <myexecutable>

Multi-task GPU-Job

Take a full GPU-node and run 6 executables each on one GPU.

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on Mogon.
#
# This script requests one task using all cores (48) on one node. 
# The job will have access to all the memory and all 6 GPUs in the node.  
#-----------------------------------------------------------------
 
#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p m2_gpu                # Partition name
#SBATCH -N 1                     # Total number of nodes requested (48 cores/node per GPU node)
#SBATCH -n 6                     # Total number of tasks 
#SBATCH -c 4                     # CPUs per task 
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:6             # Reserve 6 GPUs 
 
#SBATCH -A m2_account            # Specify allocation to charge against
 
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA/9.1.85
 
# Launch the tasks
GPUTASKS=$(grep -o ',' <(echo $SLURM_JOB_GPUS) | wc -l)
for ((i=0; i<=GPUTASKS; i++))
do
   echo "TASK $i"
   srun -n 1 -c $SLURM_CPUS_PER_TASK --exclusive --gres=gpu:1 --mem=18G <executable> & 
done
 
wait

Ignorant Applications -- or what if my programm does not understand ''CUDA_VISIBLE_DEVICES''?

Most GPU programs just know which device to select. Some do not. In any case SLURM exports the environment variable CUDA_VISIBLE_DEVICES, which simply holds the comma-separated, enumerated devices allowed in a job environment, starting from 0.

So, when for instance another job occupies the first device and your job selects two GPUs, CUDA_VISIBLE_DEVICES might hold the value 1,2 and you can read this into an array3):

#good practice is to store the initial IFS setting:
IFSbck=$IFS
IFS=',' read -a devices <<< $CUDA_VISIBLE_DEVICES
IFS=$IFSbck # in case it is used in subsequent code

Now, you can point your applications to the respective devices (assuming you start two and not one, which uses both):

cmd --argument_which_receives_the_device ${devices[0]} & # will hold the 1st
cmd --argument_which_receives_the_device ${devices[1]} & # will hold the 2nd
1)
In contrast to the Mogon I short/long scheme.
2)
Be sure that you application can utilize more than 1 GPU, when you request it!
3)
with a so-called HERE string
gpu.1553172920.txt.gz · Last modified: 2019/03/21 13:55 by jrutte02