GPU Queues
There is a number of different (public) partitions (SLURM lingo for 'queues') inside the cluster that support gpu usage:
Partition | hosts | GPUs | RAM | Access by |
---|---|---|---|---|
deeplearning | dgx[01-02] | V100 16G/32G | 11550 | project on Mogon II |
m2_gpu | s[0001-0030] | 6 GeForce GTX 1080 ti | 11550 | project on Mogon II |
Notes:
- RAM displays the default memory per node in MiB.
GPUs, after all, are a relatively costly resource.
Access
To get to know which account to use for the m2_gpu
partition login and call
sacctmgr list user $USER -s where Partition=m2_gpu format=User%10,Account%20,Partition%10
All accounts that show Partition=m2_gpu
can be used to submit jobs to the GPU-Partition.
Limitations
The m2_gpu
is a single partition allowing a runtime of up to 5 days. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too. This may result in pending reasons such as QOSGrpGRESRunMinutes
. For other pending reasons, see our page on job management.
Compiling for GPUs
Running Own or Self-Compiled Code
Unlike the login-nodes the s-nodes have Intel-CPUs, which means that you have to compile your code on the GPU-nodes otherwise you may end up with illegal instruction errors or similar.
There is a partition m2_gpu-compile
which allows for running one job per user with maximum 8 cores, 1 cpu, and –mem=18000M
for compiling your code. Maximum runtime for compile jobs is 60 minutes.
Submitting to the GPU-Partitions
To use a GPU you have to explicitly reserve it as a resource in the submission script:
#!/bin/bash # ... other SBATCH statements #SBATCH --gres=gpu:<number> #SBATCH -p <appropriate partition>
Number can be anything from 1-6 on our GPU nodes, depending on the partition. In order to use more than 1 GPU the application needs to support using this much, of course.
–gres-flags=enforce-binding
is currently not working properly in our Slurm-Version. You may try to use it with Multi-task GPU job but it won't work with Jobs reserving only part of a node. Schedmd seems to work on a bug fix.
Simple single GPU-Job
Take a single GPU-node and run an executable on it 1).
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run serial applications on Mogon. # # This script requests one task using 2 cores on one GPU-node. #----------------------------------------------------------------- #SBATCH -J mysimplegpujob # Job name #SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p m2_gpu # Partition name #SBATCH -n 1 # Total number of tasks #SBATCH -c 2 # CPUs per task #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH --gres=gpu:1 # Reserve 1 GPUs #SBATCH -A m2_account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load system/CUDA # Launch the executable srun <myexecutable>
Simple full node GPU-Job
Take a full GPU-node and run an executable that uses all 6 GPUs 2).
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run serial applications on Mogon. # # This script requests one task using all cores (48) on one node. # The job will have access to all the memory and all 6 GPUs in the node. #----------------------------------------------------------------- #SBATCH -J mysimplegpujob # Job name #SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p m2_gpu # Partition name #SBATCH -N 1 # Total number of nodes requested (48 cores/node per GPU node) #SBATCH -n 1 # Total number of tasks #SBATCH -c 48 # CPUs per task #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH --gres=gpu:6 # Reserve 6 GPUs #SBATCH -A m2_account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load system/CUDA # Launch the executable srun <myexecutable>
Multi-task GPU-Job
Take a full GPU-node and run 6 executables each on one GPU.
#!/bin/bash #----------------------------------------------------------------- # Example SLURM job script to run serial applications on MOGON. # # This script requests one task using all cores (48) on one node. # The job will have access to all the memory and all 6 GPUs in the node. #----------------------------------------------------------------- #SBATCH -J mysimplegpujob # Job name #SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId) #SBATCH -p m2_gpu # Partition name #SBATCH -N 1 # Total number of nodes requested (48 cores/node per GPU node) #SBATCH -n 6 # Total number of tasks #SBATCH -c 8 # CPUs per task #SBATCH -t 00:30:00 # Run time (hh:mm:ss) - 0.5 hours #SBATCH --gres=gpu:6 # Reserve 6 GPUs #SBATCH -A m2_account # Specify allocation to charge against # Load all necessary modules if needed (these are examples) # Loading modules in the script ensures a consistent environment. module load system/CUDA # Launch the tasks GPUTASKS=$(grep -o ',' <(echo $SLURM_JOB_GPUS) | wc -l) for ((i=0; i<=GPUTASKS; i++)) do echo "TASK $i" srun -n 1 -c $SLURM_CPUS_PER_TASK --exclusive --gres=gpu:1 --mem=18G <executable> & done wait
Ignorant Applications -- or what if my programm does not understand ''CUDA_VISIBLE_DEVICES''?
Most GPU programs just know which device to select. Some do not. In any case SLURM exports the environment variable CUDA_VISIBLE_DEVICES
, which simply holds the comma-separated, enumerated devices allowed in a job environment, starting from 0
.
So, when for instance another job occupies the first device and your job selects two GPUs, CUDA_VISIBLE_DEVICES
might hold the value 1,2
and you can read this into an array3):
#good practice is to store the initial IFS setting: IFSbck=$IFS IFS=',' read -a devices <<< $CUDA_VISIBLE_DEVICES IFS=$IFSbck # in case it is used in subsequent code
Now, you can point your applications to the respective devices (assuming you start two and not one, which uses both):
cmd --argument_which_receives_the_device ${devices[0]} & # will hold the 1st cmd --argument_which_receives_the_device ${devices[1]} & # will hold the 2nd