Using GPUs
Using graphics processing units for computations on MOGON
GPU Queues
There is a number of different public partitions (SLURM lingo for ‘queues’) part of the cluster that support GPU usage:
Partition | Hosts | GPUs | RAM | Access by |
---|---|---|---|---|
deeplearning | dgx[01-02] | V100 16G/32G | 11550 | project on Mogon II |
m2_gpu | s[0001-0027] | 6 GeForce GTX 1080 Ti | 11550 | project on Mogon II |
Notes:
- RAM displays the default memory per node in $MiB$.
Calculating on GPU nodes without using the accelerators / GPUs is prohibited!
We reserve the right to disable an account upon abusing these resources. GPUs, after all, are a relatively costly resource.
Access
To get to know which account to use for the m2_gpu
partition, login and call
sacctmgr list user $USER -s where Partition=m2_gpu formatUser%10,Account%20,Partition%10
All accounts that show Partition=m2_gpu
can be used to submit jobs to the GPU partition.
Limitations
The m2_gpu
is a single partition allowing a runtime of up to 5 days. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too. This may result in pending reasons such as QOSGrpGRESRunMinutes
. For other pending reasons, see our page on job management.
Compiling for GPUs
There is a partition m2_gpu-compile
which allows for running one job per user with maximum 8 cores, 1 CPU, and --mem=18000M
for compiling your code. Maximum runtime for compile jobs is 60 minutes.
Submitting to the GPU-Partitions
To use a GPU you have to explicitly reserve it as a resource in the submission script:
#!/bin/bash
# ... other SBATCH statements
#SBATCH --gres=gpu:<number>
#SBATCH -p <appropriate partition>
<number>
can be anything from 1-6 on our GPU nodes, depending on the partition. In order to use more than 1 GPU the application needs to support using this much, of course.
--gres-flags=enforce-binding
is currently not working properly in our Slurm version. You may try to use it with Multi-task GPU job but it won’t work with Jobs reserving only part of a node. SchedMD seems to work on a bug fix.Simple single GPU-Job
Take a single GPU-node and run an executable on it1.
#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on Mogon.
#
# This script requests one task using 2 cores on one GPU-node.
#-----------------------------------------------------------------
#SBATCH -J mysimplegpujob # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p m2_gpu # Partition name
#SBATCH -n 1 # Total number of tasks
#SBATCH -c 2 # CPUs per task
#SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:1 # Reserve 1 GPUs
#SBATCH -A m2_account # Specify allocation to charge against
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA
# Launch the executable
srun <myexecutable>
Simple full node GPU-Job
Take a full GPU-node and run an executable that uses all 6 GPUs2.
#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on Mogon.
#
# This script requests one task using all cores (48) on one node.
# The job will have access to all the memory and all 6 GPUs in the node.
#-----------------------------------------------------------------
#SBATCH -J mysimplegpujob # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p m2_gpu # Partition name
#SBATCH -N 1 # Total number of nodes requested (48 cores/node per GPU node)
#SBATCH -n 1 # Total number of tasks
#SBATCH -c 48 # CPUs per task
#SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:6 # Reserve 6 GPUs
#SBATCH -A m2_account # Specify allocation to charge against
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA
# Launch the executable
srun <myexecutable>
Multi-task GPU-Job
Take a full GPU-node and run 6 executables each on one GPU.
#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on MOGON.
#
# This script requests one task using all cores (48) on one node.
# The job will have access to all the memory and all 6 GPUs in the node.
#-----------------------------------------------------------------
#SBATCH -J mysimplegpujob # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p m2_gpu # Partition name
#SBATCH -N 1 # Total number of nodes requested (48 cores/node per GPU node)
#SBATCH -n 6 # Total number of tasks
#SBATCH -c 8 # CPUs per task
#SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:6 # Reserve 6 GPUs
#SBATCH -A m2_account # Specify allocation to charge against
# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA
# Launch the tasks
GPUTASKS=$(grep -o ',' <(echo $SLURM_JOB_GPUS) | wc -l)
for ((i=0; i<GPUTASKS; i++))
do
echo "TASK $i"
srun -n 1 -c $SLURM_CPUS_PER_TASK --exclusive --gres=gpu:1 --mem18G <executable> &
done
wait
Ignorant Applications –
or What if my Program does not understand CUDA_VISIBLE_DEVICES
?
Most GPU programs just know which device to select. Some do not. In any case SLURM exports the environment variable CUDA_VISIBLE_DEVICES
, which simply holds the comma-separated, enumerated devices allowed in a job environment, starting from 0
.
So, when for instance another job occupies the first device and your job selects two GPUs, CUDA_VISIBLE_DEVICES
might hold the value 1,2
and you can read this into an array (with a so-called HERE string ):
#good practice is to store the initial IFS setting:
IFSbck=$IFS
IFS=',' read -a devices <<< $CUDA_VISIBLE_DEVICES
IFS=$IFSbck # in case it is used in subsequent code
Now, you can point your applications to the respective devices (assuming you start two and not one, which uses both):
cmd --argument_which_receives_the_device ${devices[0]} & # will hold the 1st
cmd --argument_which_receives_the_device ${devices[1]} & # will hold the 2nd