partitions

This is an old revision of the document!

General Notes

On Mogon we differentiate between public partitions (those readily visible with sinfo) and non-public ones. The later have restricted access and will not be described here. They are set to be hidden.

Detailed information on partitions can be retrieved with

scontrol show partition <partitionname>

Quality of service (QoS) values can be viewed with

sacctmgr show qos <qos_of_that_partition_name>

The partition constraints below will occasionally cause SLURM to leave a job pending. Common pending reasons are described here.

In SLURM a partition can be selected in your jobscript by

#SBATCH -p <partitionname>

or interactively: $sbatch -p <partitionname> … <jobscript> Severel partitions can be selected with #SBATCH -p <partition1>,<partition2> This can be useful for users with “private” hardware, to allow a job to be scheduled onto general purpose hardware, when the group-owned hardware is occupied. Partition Nodes Max wall time % nodes Interconnect Constraints short a-nodes 5 hours 25 Infiniband jobs using n « 64, Max running jobs per user: 10.000 long a-nodes 5 days 20 Infiniband jobs using n « 64, Max running jobs per user: 3.000 nodeshort a-nodes 5 hours 100 Infiniband jobs using n*64, for 1 < n < all of mogon nodelong a-nodes 5 days 30 Infiniband jobs using n*64, for 1 < n < all of mogon, Max running jobs per association: 100 devel a-nodes 4 hours 1 Infiniband Max running jobs per user: 1 visualize a-nodes 5 hours 1 Infiniband Max TRES per user: cpu=129 The default memory for a partition is listed with the command giving further details: scontrol show partition <partition name>. If you require more memory per node as defined by the defaults, the mogon I a-nodes offer Memory [MiB] No. of Nodes 1) 115500 444 242500 96 497500 15 Partition Nodes Max wall time Interconnect Accelerators Comment titanshort i-nodes 5 hours Infiniband 4 GeForce GTX TITAN per node see using GPUs under slurm titanlong i-nodes 5 days Infiniband 4 GeForce GTX TITAN per node see using GPUs under slurm teslashort h-nodes 5 hours Infiniband - see using GPUs under slurm teslalong h-nodes 5 days Infiniband - see using GPUs under slurm m2_gpu s-nodes 5 days Infiniband 6 GeForce GTX 1080 Ti per node - Only ~ 5% of nodes are available for small jobs (n«40). Each account has a GrpTRESRunLimit. Check using sacctmgr -s list account <your_account> format=account,GRpTRESRunMin, you can use sacctmgr -n -s list user$USER format=Account | grep -v none to get your accounts. The default is cpu=22982400, which is the equivalent of using 700 nodes for 12 hours in total:

Partition Nodes Max wall time nodes Interconnect Accelators Comment
parallel z-nodes x-nodes 5 days 64GB,96GB,128GB,192GB,256GB-nodes Intel Omnipath - jobs using n*40 or jobs using n*64
smp z-nodes x-nodes 5 days up to 5% of 64GB,96GB,128GB,192GB,256GB-nodes Intel Omnipath - jobs using n « 40 or n « 64, Max running jobs per user: 3.000
bigmem z-nodes x-nodes 5 days 384GB,512GB,1TB,1.5TB-nodes Intel Omnipath - jobs in need for memory
devel z-nodes x-nodes 4 hours 10 64GB,96GB,128GB-nodes Intel Omnipath - Max running jobs per user: 2

For the parallel partition we find:

Memory [MiB] No. of Nodes 2) Type
88500 576 skylake
177000 120 skylake

Likewise for the bigmem partition:

Memory [MiB] No. of Nodes 3) Type
354000 32 skylake
1516000 2 skylake

Hidden Partitions

Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are 'private' to certain project / groups and of interest to these groups, only.

To visualize all jobs for a user in all partitions supply the -a flag:

$squeue -u$USER -a

Likewise sinfo can be supplemented with -a to gather informations. All other commands work without this flag as expected.

1) , 2) , 3)
if all nodes are functional
• partitions.1554369997.txt.gz