This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lsf_gpu [2015/12/03 17:14] meesters [GPU Queues] |
lsf_gpu [2018/05/08 08:36] (current) meesters [GPU Queues] - fixed broken link |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== GPU Queues ====== | ====== GPU Queues ====== | ||
- | The titan-Queues (''titanshort/long'') currently include hosts i0001-i0009, while the gpu-Queues (''gpushort/long'') include the hosts g0001-g0009. The titan-hosts carry 4 GeForce GTX TITAN, hence a usage request up to ''cuda=4'' can be selected (see below). In contrast the GeForce GTX 480 is installed on the gpu-hosts (for the ''gpushort/long'' queues). Finally, for the tesla-Queues (''teslashort/long'') 4 Tesla K20m cards are installed. | + | There are three different [[partitions|partitions (SLURM lingo for 'queues')]] inside the cluster that support gpu usage: The titan-Queues (''titanshort/long'') currently include hosts i0001-i0009, while the gpu-Queues (''infogpu'') include the hosts g0001-g0009 ((formally there have been the ''gpushort/long'' queues on these nodes - access however is restricted.)). The titan-hosts carry 4 GeForce GTX TITAN, hence a usage request up to ''cuda=4'' can be selected (see below). In contrast the GeForce GTX 480 is installed on the gpu-hosts (for the ''gpushort/long'' queues). Finally, for the tesla-Queues (''teslashort/long'') 4 Tesla K20m cards are installed. |
- | + | ||
- | The following link gives an overview on the [[nodes|Compute Nodes]]. To associate hosts with queues, type | + | |
- | <code bash> | + | |
- | $ bqueues -l <queuename> | grep HOSTS | + | |
- | </code> | + | |
- | pick the resulting short name from the output and with | + | |
- | <code bash> | + | |
- | $ bhosts <nodename> | + | |
- | </code> | + | |
- | you will get the relevant hosts. | + | |
- | + | ||
- | The max. runtime is analogous to the other short/long-Queues. Note however, that wall clock times are given to the reference hosts of a queue, if this host's CPU is faster, than the submission host and you still want to give a time estimate without calculating you first have to infer the reference host of the queue you are using. This is the host given in | + | |
- | <code bash> | + | |
- | $ bqueues -l <queuename> | grep '.0 min' | + | |
- | </code> | + | |
- | + | ||
- | Hence: | + | |
- | <code bash> | + | |
- | bsub -q <queuename> -W 30/<reference_host> | + | |
- | </code> | + | |
- | will result in 30 min wall clock time (real time). | + | |
====== GPU Usage ====== | ====== GPU Usage ====== | ||
- | To use a GPU you have to explicitly reserve it as a resource in the bsub call: | + | To use a GPU you have to explicitly reserve it as a resource in the submission script: |
<code bash> | <code bash> | ||
- | $ bsub -n 1 -R 'rusage[cuda=1]' -q gpulong ./my_program | + | #!/bin/bash |
+ | # ... other SBATCH statements | ||
+ | #SBATCH --gres=gpu:<number> | ||
+ | #SBATCH -p <appropriate partition> | ||
</code> | </code> | ||
- | The code or application to be carried out needs to | + | Number can be anything from 1-4 on our GPU nodes. In order to use more than 1 GPU the application needs to support using this much, of course. |
- | - be an executable script or program. | + | |
- | - carry a [[http://en.wikipedia.org/wiki/Shebang_%28Unix%29|shebang]]. | + | |
- | + | ||
- | While this is true for LSF in general, it is imposed for the GPU-resource requests. | + | |
- | + | ||
- | ===== nodes ===== | + | |
- | + | ||
- | In the previous example 1 CPU is requested (''-n 1'') and 1 GPU (''cuda=1''). The actual number of GPUs available depends on the queue you are using. | + | |
- | + | ||
- | ===== Using multiple GPUs ===== | + | |
- | + | ||
- | If supported by the queue, you can request multiple GPUs like | + | |
- | + | ||
- | <code bash> | + | |
- | $ bsub -n 1 -R 'rusage[cuda=4]' -q titanshort ./my_program | + | |
- | </code> | + | |
- | + | ||
- | Be sure to add a sufficient time estimate with ''-W''. Also, multiple CPUs can be requested with the usual ptile option. | + | |
===== Using multiple nodes and multiple GPUs ===== | ===== Using multiple nodes and multiple GPUs ===== | ||
- | In order to use multiples nodes, you have to request entire nodes and entire GPU sets, e.g. | + | In order to use multiples nodes, you have to request more than one node. |
- | + | ||
- | <code bash> | + | |
- | $ bsub -q titanshort -n 2 -R 'span[ptile=1]' -R 'affinity[core(16)]' -R 'rusage[cuda=4] | + | |
- | </code> | + | |
- | + | ||
- | In this example 2 entire titannodes will be used (also the CPU set). | + | |
- | + | ||
- | Your job script / job command has to export the environment of your job. ''mpirun'' implementations do have an option for this (see your ''mpirun'' man page). | + | |
- | + | ||
- | <WRAP alert> | + | |
- | //**Attention**// | + | |
- | + | ||
- | Multiple GPU nodes require to take entire nodes. The entire GPU set has to be claimed and the entire CPU set - either by setting ''affinity(core'' or ''ptile''. | + | |
- | </WRAP> | + |