start:working_on_mogon:gpu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
start:working_on_mogon:gpu [2020/10/05 12:18]
jrutte02 [GPU Queues]
start:working_on_mogon:gpu [2021/04/28 12:20] (current)
ntretyak load default CUDA-version
Line 5: Line 5:
  
 ^ Partition          ^ hosts                ^ GPUs                ^ RAM          ^  Access by ^ ^ Partition          ^ hosts                ^ GPUs                ^ RAM          ^  Access by ^
-| ''titan*''             | [[start:mogon_cluster:nodes:out_of_service#mogon_i|i[0001-0009]]]         4 GeForce GTX Titan node  4870               | project on Mogon I | +| ''deeplearning''             | [[start:mogon_cluster:nodes|dgx[01-02]]]       V100 16G/32G11550       | project on Mogon II 
-| ''tesla*''             | [[start:mogon_cluster:nodes:out_of_service#mogon_i|h[0001-0004]]]         | 4 Tesla K20m / node |  4900                | project on Mogon I +| ''m2_gpu''             | [[start:mogon_cluster:nodes|s[0001-0030]]]        | 6 GeForce GTX 1080 ti| 11550       | project on Mogon II |
-| ''m2_gpu''             | [[start:mogon_cluster:nodes:out_of_service#mogon_i|s[0001-0030]]]        | 6 GeForce GTX 1080 ti| 11550       | project on Mogon II |+
  
-//Physically all GPU nodes are placed together with MOGON I, hence users need to log in to MOGON I even to use the ''m2_gpu'' partition.// 
  
 Notes:  Notes: 
   * RAM displays the default memory per node in MiB.   * RAM displays the default memory per node in MiB.
-  * The Mogon I titan/tesla nodes come in as ''*short'' or ''*long'' queues, which is associated with maximum run times: 5 days and 5 hours, respectively. 
  
-<callout type="info" title="MOGON I" icon="true"> 
-titan/tesla nodes are not maintained any longer - the number of nodes is steadily declining. Eventually, they will be phased out. 
-</callout> 
  
 <callout type="warning" icon="true"> <callout type="warning" icon="true">
Line 27: Line 21:
 ===== Access ===== ===== Access =====
  
-The **accelerators (GPUs) of MOGON II** are placed in the ZDV premise and **are** hence **part of** the **MOGON I** infrastructure. That is to say, you have to log in to MOGON I but use your MOGON II account (''-A m2_*'') to have access to those 189 GPUs in the ''m2_gpu'' partition. The tesla/titan partition are accessible by all accounts. 
  
-To get to know which account to use for the ''m2_gpu'' partition login to MOGON I and call +To get to know which account to use for the ''m2_gpu'' partition login and call 
 <code bash>  <code bash> 
 sacctmgr list user $USER -s where Partition=m2_gpu format=User%10,Account%20,Partition%10 sacctmgr list user $USER -s where Partition=m2_gpu format=User%10,Account%20,Partition%10
Line 42: Line 35:
 ===== Limitations ===== ===== Limitations =====
  
-The ''m2_gpu'' is a single partition((In contrast to the MOGON I ''short''/''long'' scheme.)) allowing a runtime  of up to 5 days. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too. This may result in pending reasons such as ''QOSGrpGRESRunMinutes''. For other pending reasons, see [[:start:working_on_mogon:slurm_manage|our page on job management]].+The ''m2_gpu'' is a single partition allowing a runtime  of up to 5 days. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too. This may result in pending reasons such as ''QOSGrpGRESRunMinutes''. For other pending reasons, see [[:start:working_on_mogon:slurm_manage|our page on job management]].
  
 ===== Compiling for GPUs  ===== ===== Compiling for GPUs  =====
Line 78: Line 71:
 # Example SLURM job script to run serial applications on Mogon. # Example SLURM job script to run serial applications on Mogon.
 # #
-# This script requests one task using all cores (48) on one node.  +# This script requests one task using cores on one GPU-node.  
-# The job will have access to all the memory and all 6 GPUs in the node.  +
 #----------------------------------------------------------------- #-----------------------------------------------------------------
  
Line 93: Line 85:
 # Load all necessary modules if needed (these are examples) # Load all necessary modules if needed (these are examples)
 # Loading modules in the script ensures a consistent environment. # Loading modules in the script ensures a consistent environment.
-module load system/CUDA/9.1.85+module load system/CUDA
  
 # Launch the executable # Launch the executable
Line 126: Line 118:
 # Load all necessary modules if needed (these are examples) # Load all necessary modules if needed (these are examples)
 # Loading modules in the script ensures a consistent environment. # Loading modules in the script ensures a consistent environment.
-module load system/CUDA/9.1.85+module load system/CUDA
  
 # Launch the executable # Launch the executable
Line 151: Line 143:
 #SBATCH -N 1                     # Total number of nodes requested (48 cores/node per GPU node) #SBATCH -N 1                     # Total number of nodes requested (48 cores/node per GPU node)
 #SBATCH -n 6                     # Total number of tasks  #SBATCH -n 6                     # Total number of tasks 
-#SBATCH -c                     # CPUs per task +#SBATCH -c                     # CPUs per task 
 #SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours #SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
 #SBATCH --gres=gpu:            # Reserve 6 GPUs  #SBATCH --gres=gpu:            # Reserve 6 GPUs 
Line 159: Line 151:
 # Load all necessary modules if needed (these are examples) # Load all necessary modules if needed (these are examples)
 # Loading modules in the script ensures a consistent environment. # Loading modules in the script ensures a consistent environment.
-module load system/CUDA/9.1.85+module load system/CUDA
  
 # Launch the tasks # Launch the tasks
  • start/working_on_mogon/gpu.1601893101.txt.gz
  • Last modified: 2020/10/05 12:18
  • by jrutte02