partitions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
partitions [2019/11/20 17:41]
henkela [Partitions for Applications using Accelerators]
— (current)
Line 1: Line 1:
-====== General Notes ====== 
  
-On Mogon we differentiate between public partitions (those readily visible with ''​sinfo''​) and non-public ones. The later have restricted access and will not be described here. They are set to be [[partitions#​Hidden_Partitions|hidden]]. 
- 
-Detailed information on partitions can be retrieved with 
- 
-<code bash> 
-scontrol show partition <​partitionname>​ 
-</​code>​ 
- 
-Quality of service (QoS) values can be viewed with 
- 
-<code bash> 
-sacctmgr show qos <​qos_of_that_partition_name>​ 
-</​code>​ 
- 
- 
-<WRAP center round info 90%> 
-The partition constraints below will occasionally cause SLURM to leave a job pending. Common pending reasons are described [[slurm_manage|here]]. 
-</​WRAP>​ 
- 
-===== Submitting to partitions ===== 
- 
-In SLURM a partition can be selected in your jobscript by 
-<code bash> 
-#SBATCH -p <​partitionname>​ 
-</​code>​ 
-or interactively:​ ''​$ sbatch -p <​partitionname>​ ... <​jobscript>''​ 
- 
-Severel partitions can be selected with  
-<code bash> 
-#SBATCH -p <​partition1>,<​partition2>​ 
-</​code>​ 
-This can be useful for users with "​private"​ hardware, to allow a job to be scheduled onto general purpose hardware, when the group-owned hardware is occupied. 
- 
-===== Mogon 1 ===== 
- 
-==== General purpose CPUs ==== 
- 
- 
-^ Partition ​         ^ Nodes        ^ Max wall time   ^ % nodes    ^ Interconnect ​ ^ Constraints ​           ^ 
-| short  | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|a-nodes]] ​ | 5 hours | 25 | Infiniband | jobs using n << 64, Max running jobs per user: 10.000 | 
-| long  | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|a-nodes]] ​ | 5 days| 20 | Infiniband | jobs using n << 64, Max running jobs per user: 3.000 | 
-| nodeshort ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|a-nodes]] ​ | 5 hours | 100 | Infiniband | jobs using n*64, for 1 < n < all of mogon| 
-| nodelong ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|a-nodes]] ​ | 5 days| 30 | Infiniband | jobs using n*64, for 1 < n < all of mogon, Max running jobs per association:​ 100 | 
-| devel  | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|a-nodes]] ​ | 4 hours | 1 | Infiniband | Max running jobs per user: 1 | 
-| visualize ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|a-nodes]] ​ | 5 hours | 1 | Infiniband | Max TRES per user: cpu=129 | 
- 
-The default memory for a partition is listed with the command giving further details: ''​scontrol show partition <​partition name>''​. 
- 
-If you require more memory per node as defined by the defaults, the mogon I a-nodes offer  
- 
-^ Memory [MiB]          ^ No. of Nodes ((if all nodes are functional)) ​      ^ 
-|115500 ​               | 444 |  
-|242500 ​               | 96 |  
-|497500 ​               | 15 |  
- 
-==== Partitions for Applications using Accelerators ==== 
- 
-^ Partition ​         ^ Nodes        ^ Max wall time      ^ Interconnect ^ Accelerators ^ Comment ​           ^ 
-| titanshort ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|i-nodes]] ​ | 5 hours  | Infiniband | 4 GeForce GTX TITAN per node| see [[lsf_gpu|using GPUs under slurm]] | 
-| titanlong ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|i-nodes]] ​ | 5 days | Infiniband |4 GeForce GTX TITAN per node  | see [[lsf_gpu|using GPUs under slurm]] | 
-| teslashort ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|h-nodes]] ​ | 5 hours  | Infiniband | -| see [[lsf_gpu|using GPUs under slurm]] | 
-| teslalong ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|h-nodes]] ​ | 5 days | Infiniband | -  | see [[lsf_gpu|using GPUs under slurm]] | 
-| m2_gpu | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_i|s-nodes]]| 5 days | Infiniband | 6 GeForce GTX 1080 Ti per node | - | 
-| deeplearning | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|dgx-nodes]]| 12 hours | Infiniband | 8 Tesla V100-SXM2 per node | - | 
- 
- 
-===== Mogon 2 ===== 
- 
-Only ~ 5% of nodes are available for small jobs (n<<​40). Each account has a GrpTRESRunLimit. Check using ''​sacctmgr -s list account <​your_account>​ format=account,​GRpTRESRunMin'',​ you can use ''​sacctmgr -n -s list user $USER format=Account | grep -v none''​ to get your accounts. The default is cpu=22982400,​ which is the equivalent of using 700 nodes for 12 hours in total: ​ 
- 
-^ Partition ​         ^ Nodes        ^ Max wall time   ^ nodes    ^ Interconnect ^ Accelators ^ Comment ​           ^ 
-| parallel ​ | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|z-nodes]] [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|x-nodes]] | 5 days| 64GB,​96GB,​128GB,​192GB,​256GB-nodes | Intel Omnipath | -| jobs using n*40 or jobs using n*64 | 
-| smp | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|z-nodes]] [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|x-nodes]] ​  | 5 days| up to 5% of 64GB,​96GB,​128GB,​192GB,​256GB-nodes | Intel Omnipath | -| jobs using n << 40 or n << 64, Max running jobs per user: 3.000 | 
-| bigmem | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|z-nodes]] [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|x-nodes]] ​ | 5 days | 384GB,​512GB,​1TB,​1.5TB-nodes | Intel Omnipath | -| jobs in need for memory | 
-| devel  | [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|z-nodes]] [[https://​mogonwiki.zdv.uni-mainz.de/​dokuwiki/​nodes#​mogon_ii|x-nodes]] | 4 hours | 10 64GB,​96GB,​128GB-nodes | Intel Omnipath | -| Max running jobs per user: 2 | 
- 
-For the ''​parallel''​ partition we find: 
- 
-^ Memory [MiB]          ^ No. of Nodes ((if all nodes are functional)) ​      ^ Type ^ 
-|57000 ​                | 584 | broadwell | 
-|88500 ​                | 576 | skylake | 
-|120000 ​               | 120 | broadwell | 
-|177000 ​               | 120 | skylake | 
-|246000 ​               | 40  | broadwell| 
- 
-Likewise for the ''​bigmem''​ partition: 
- 
-^ Memory [MiB]          ^ No. of Nodes ((if all nodes are functional)) ​      ^ Type ^ 
-|354000 ​               | 32 | skylake | 
-|498000 ​               | 20 | broadwell | 
-|1002000 ​              | 2 | broadwell | 
-|1516000 ​              | 2 | skylake | 
- 
- 
-====== Hidden Partitions ====== 
- 
-Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are '​private'​ to certain project / groups and of interest to these groups, only. 
- 
-To visualize all jobs for a user in all partitions supply the ''​-a''​ flag: 
- 
-<code bash> 
-$ squeue -u $USER -a 
-</​code>​ 
- 
-Likewise ''​sinfo''​ can be supplemented with ''​-a''​ to gather informations. All other commands work without this flag as expected. 
  • partitions.1574268102.txt.gz
  • Last modified: 2019/11/20 17:41
  • by henkela