start:working_on_mogon:partitions

General Notes

On MOGON we differentiate between public partitions (those readily visible with sinfo) and non-public ones. The later have restricted access and will not be described here. They are set to be hidden.

Detailed information on partitions can be retrieved with

scontrol show partition <partition_name>

Quality of service (QoS) values can be viewed with

sacctmgr show qos <qos_of_that_partition_name>

Informations regarding jobs running or pending within a partition can be obtained by

squeue -p <partition_name>,

while an status overview is given by

sinfo -p <partition_name>.

In SLURM a partition can be selected in your jobscript by

#SBATCH -p <partitionname>

or interactively: $ sbatch -p <partitionname> … <jobscript>

Severel partitions can be selected with

#SBATCH -p <partition1>,<partition2>

This can be useful for users with "private" hardware, to allow a job to be scheduled onto general purpose hardware, when the group-owned hardware is occupied.

The partition constraints below will occasionally cause SLURM to leave a job pending. Common pending reasons are described here.

MOGON II

Only ~ 5% of nodes are available for small jobs (n«40). Each account has a GrpTRESRunLimit. Check using sacctmgr -s list account <your_account> format=account,GRpTRESRunMin, you can use sacctmgr -n -s list user $USER format=Account | grep -v none to get your accounts. The default is cpu=22982400, which is the equivalent of using 700 nodes for 12 hours in total:

Partition Nodes Max wall time nodes Interconnect Accelerators Comment
parallel z-nodes x-nodes 5 days 64GB,96GB,128GB,192GB,256GB-nodes Intel Omnipath - jobs using n*40 or jobs using n*64
smp z-nodes x-nodes 5 days up to 5% of 64GB,96GB,128GB,192GB,256GB-nodes Intel Omnipath - jobs using n « 40 or n « 64, Max running jobs per user: 3.000
bigmem z-nodes x-nodes 5 days 384GB,512GB,1TB,1.5TB-nodes Intel Omnipath - 256GB or more memory
devel z-nodes x-nodes 4 hours 10 64GB,96GB,128GB-nodes Intel Omnipath - Max 2 Jobs per User, Max 320 CPUs in total
Partition Nodes Max wall time Interconnect Accelerators Comment
m2_gpu s-nodes 5 days Infiniband 6 GeForce GTX 1080 Ti per node -
deeplearning dgx-nodes 12 hours Infiniband 8 Tesla V100-SXM2 per node for access get in touch with us

For the parallel partition we find:

Memory [MiB] No. of Nodes 1) Type
57000 584 broadwell
88500 576 skylake
120000 120 broadwell
177000 120 skylake
246000 40 broadwell

Likewise for the bigmem partition:

Memory [MiB] No. of Nodes 2) Type
354000 32 skylake
498000 20 broadwell
1002000 2 broadwell
1516000 2 skylake
Partition Nodes Max wall time nodes Interconnect Accelerators Comment
himster2_exp x0753 - x0794, x2001-x2023 5 days 96GB Intel OmniPath - -
himster2_th x2024 - x2320 5 days 96GB Intel OmniPath - -

Hidden Partitions

Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are "private" to certain project / groups and of interest to these groups, only.

To visualize all jobs for a user in all partitions supply the -a flag:

$ squeue -u $USER -a

Likewise sinfo can be supplemented with -a to gather informations. All other commands work without this flag as expected.

Out of Service

Under the following link you will find clusters that have been taken out of service for various reasons:

Out of Service Clusters


1) , 2)
if all nodes are functional
  • start/working_on_mogon/partitions.txt
  • Last modified: 2021/07/28 12:53
  • by bbrickwe