User Tools

Site Tools


partitions

General Notes

On Mogon we differentiate between public partitions (those readily visible with sinfo) and non-public ones. The later have restricted access and will not be described here. They are set to be hidden.

Detailed information on partitions can be retrieved with

scontrol show partition <partitionname>

Quality of service (QoS) values can be viewed with

sacctmgr show qos <qos_of_that_partition_name>

The partition constraints below will occasionally cause SLURM to leave a job pending. Common pending reasons are described here.

Submitting to partitions

In SLURM a partition can be selected in your jobscript by

#SBATCH -p <partitionname>

or interactively: $ sbatch -p <partitionname> … <jobscript>

Severel partitions can be selected with

#SBATCH -p <partition1>,<partition2>

This can be useful for users with “private” hardware, to allow a job to be scheduled onto general purpose hardware, when the group-owned hardware is occupied.

Mogon 1

General purpose CPUs

Partition Nodes Max wall time % nodes Interconnect Constraints
short a-nodes 5 hours 25 Infiniband jobs using n « 64, Max running jobs per user: 10.000
long a-nodes 5 days 20 Infiniband jobs using n « 64, Max running jobs per user: 3.000
nodeshort a-nodes 5 hours 100 Infiniband jobs using n*64, for 1 < n < all of mogon
nodelong a-nodes 5 days 30 Infiniband jobs using n*64, for 1 < n < all of mogon, Max running jobs per association: 100
devel a-nodes 4 hours 1 Infiniband Max running jobs per user: 1
visualize a-nodes 5 hours 1 Infiniband Max TRES per user: cpu=129

The default memory for a partition is listed with the command giving further details: scontrol show partition <partition name>.

If you require more memory per node as defined by the defaults, the mogon I a-nodes offer

Memory [MiB] No. of Nodes 1)
115500 444
242500 96
497500 15

Partitions for Applications using Accelerators

Partition Nodes Max wall time Interconnect Accelerators Comment
titanshort i-nodes 5 hours Infiniband 4 GeForce GTX TITAN per node see using GPUs under slurm
titanlong i-nodes 5 days Infiniband 4 GeForce GTX TITAN per node see using GPUs under slurm
teslashort h-nodes 5 hours Infiniband - see using GPUs under slurm
teslalong h-nodes 5 days Infiniband - see using GPUs under slurm
m2_gpu s-nodes 5 days Infiniband 6 GeForce GTX 1080 Ti per node -
deeplearning dgx-nodes 12 hours Infiniband 8 Tesla V100-SXM2 per node -

Mogon 2

Only ~ 5% of nodes are available for small jobs (n«40). Each account has a GrpTRESRunLimit. Check using sacctmgr -s list account <your_account> format=account,GRpTRESRunMin, you can use sacctmgr -n -s list user $USER format=Account | grep -v none to get your accounts. The default is cpu=22982400, which is the equivalent of using 700 nodes for 12 hours in total:

Partition Nodes Max wall time nodes Interconnect Accelators Comment
parallel z-nodes x-nodes 5 days 64GB,96GB,128GB,192GB,256GB-nodes Intel Omnipath - jobs using n*40 or jobs using n*64
smp z-nodes x-nodes 5 days up to 5% of 64GB,96GB,128GB,192GB,256GB-nodes Intel Omnipath - jobs using n « 40 or n « 64, Max running jobs per user: 3.000
bigmem z-nodes x-nodes 5 days 384GB,512GB,1TB,1.5TB-nodes Intel Omnipath - jobs in need for memory
devel z-nodes x-nodes 4 hours 10 64GB,96GB,128GB-nodes Intel Omnipath - Max running jobs per user: 2

For the parallel partition we find:

Memory [MiB] No. of Nodes 2) Type
57000 584 broadwell
88500 576 skylake
120000 120 broadwell
177000 120 skylake
246000 40 broadwell

Likewise for the bigmem partition:

Memory [MiB] No. of Nodes 3) Type
354000 32 skylake
498000 20 broadwell
1002000 2 broadwell
1516000 2 skylake

Hidden Partitions

Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are 'private' to certain project / groups and of interest to these groups, only.

To visualize all jobs for a user in all partitions supply the -a flag:

$ squeue -u $USER -a

Likewise sinfo can be supplemented with -a to gather informations. All other commands work without this flag as expected.

1) , 2) , 3)
if all nodes are functional
partitions.txt · Last modified: 2019/11/20 17:41 by henkela