partitions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
partitions [2020/07/14 15:10]
jrutte02 removed
— (current)
Line 1: Line 1:
-====== General Notes ====== 
  
-On Mogon we differentiate between public partitions (those readily visible with ''sinfo'') and non-public ones. The later have restricted access and will not be described here. They are set to be [[partitions#Hidden_Partitions|hidden]]. 
- 
-Detailed information on partitions can be retrieved with 
- 
-<code bash> 
-scontrol show partition <partitionname> 
-</code> 
- 
-Quality of service (QoS) values can be viewed with 
- 
-<code bash> 
-sacctmgr show qos <qos_of_that_partition_name> 
-</code> 
- 
- 
-<WRAP center round info 90%> 
-The partition constraints below will occasionally cause SLURM to leave a job pending. Common pending reasons are described [[slurm_manage|here]]. 
-</WRAP> 
- 
-===== Submitting to partitions ===== 
- 
-In SLURM a partition can be selected in your jobscript by 
-<code bash> 
-#SBATCH -p <partitionname> 
-</code> 
-or interactively: ''$ sbatch -p <partitionname> ... <jobscript>'' 
- 
-Severel partitions can be selected with  
-<code bash> 
-#SBATCH -p <partition1>,<partition2> 
-</code> 
-This can be useful for users with "private" hardware, to allow a job to be scheduled onto general purpose hardware, when the group-owned hardware is occupied. 
- 
-===== Mogon 1 ===== 
- 
-==== General purpose CPUs ==== 
- 
- 
-^ Partition          ^ Nodes        ^ Max wall time   ^ % nodes    ^ Interconnect  ^ Constraints            ^ 
-| short  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|a-nodes]]  | 5 hours | 25 | Infiniband | jobs using n << 64, Max running jobs per user: 10.000 | 
-| long  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|a-nodes]]  | 5 days| 20 | Infiniband | jobs using n << 64, Max running jobs per user: 3.000 | 
-| nodeshort  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|a-nodes]]  | 5 hours | 100 | Infiniband | jobs using n*64, for 1 < n < all of mogon| 
-| nodelong  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|a-nodes]]  | 5 days| 30 | Infiniband | jobs using n*64, for 1 < n < all of mogon, Max running jobs per association: 100 | 
-| devel  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|a-nodes]]  | 4 hours | 1 | Infiniband | Max running jobs per user: 1 | 
-| visualize  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|a-nodes]]  | 5 hours | 1 | Infiniband | Max TRES per user: cpu=129 | 
- 
-The default memory for a partition is listed with the command giving further details: ''scontrol show partition <partition name>''. 
- 
-If you require more memory per node as defined by the defaults, the mogon I a-nodes offer  
- 
-^ Memory [MiB]          ^ No. of Nodes ((if all nodes are functional))       ^ 
-|115500                | 444 |  
-|242500                | 96 |  
-|497500                | 15 |  
- 
-=== Partition Limits === 
- 
-We have put limits in place to prevent single users or group to solely take up all resources in a given partition. These limitations can be altered to improve the system utilization and are henceforth not given in detail. They may result in pending jobs; [[slurm_manage#pending_reasons|pending reasons are listed in our wiki]]. 
- 
- 
-==== Partitions for Applications using Accelerators ==== 
- 
-^ Partition          ^ Nodes        ^ Max wall time      ^ Interconnect ^ Accelerators ^ Comment            ^ 
-| titanshort  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|i-nodes]]  | 5 hours  | Infiniband | 4 GeForce GTX TITAN per node| see [[lsf_gpu|using GPUs under slurm]] | 
-| titanlong  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|i-nodes]]  | 5 days | Infiniband |4 GeForce GTX TITAN per node  | see [[lsf_gpu|using GPUs under slurm]] | 
-| teslashort  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|h-nodes]]  | 5 hours  | Infiniband | -| see [[lsf_gpu|using GPUs under slurm]] | 
-| teslalong  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|h-nodes]]  | 5 days | Infiniband | -  | see [[lsf_gpu|using GPUs under slurm]] | 
-| m2_gpu | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_i|s-nodes]]| 5 days | Infiniband | 6 GeForce GTX 1080 Ti per node | - | 
-| deeplearning | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|dgx-nodes]]| 12 hours | Infiniband | 8 Tesla V100-SXM2 per node | [[mailto:hpc@uni-mainz.de|for access get in touch with us]] | 
- 
- 
-===== Mogon 2 ===== 
- 
-Only ~ 5% of nodes are available for small jobs (n<<40). Each account has a GrpTRESRunLimit. Check using ''sacctmgr -s list account <your_account> format=account,GRpTRESRunMin'', you can use ''sacctmgr -n -s list user $USER format=Account | grep -v none'' to get your accounts. The default is cpu=22982400, which is the equivalent of using 700 nodes for 12 hours in total:  
- 
-^ Partition          ^ Nodes        ^ Max wall time   ^ nodes    ^ Interconnect ^ Accelators ^ Comment            ^ 
-| parallel  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|z-nodes]] [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|x-nodes]] | 5 days| 64GB,96GB,128GB,192GB,256GB-nodes | Intel Omnipath | -| jobs using n*40 or jobs using n*64 | 
-| smp | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|z-nodes]] [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|x-nodes]]   | 5 days| up to 5% of 64GB,96GB,128GB,192GB,256GB-nodes | Intel Omnipath | -| jobs using n << 40 or n << 64, Max running jobs per user: 3.000 | 
-| bigmem | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|z-nodes]] [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|x-nodes]]  | 5 days | 384GB,512GB,1TB,1.5TB-nodes | Intel Omnipath | -| jobs in need for memory | 
-| devel  | [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|z-nodes]] [[https://mogonwiki.zdv.uni-mainz.de/dokuwiki/nodes#mogon_ii|x-nodes]] | 4 hours | 10 64GB,96GB,128GB-nodes | Intel Omnipath | -| Max running jobs per user: 2 | 
- 
-For the ''parallel'' partition we find: 
- 
-^ Memory [MiB]          ^ No. of Nodes ((if all nodes are functional))       ^ Type ^ 
-|57000                 | 584 | broadwell | 
-|88500                 | 576 | skylake | 
-|120000                | 120 | broadwell | 
-|177000                | 120 | skylake | 
-|246000                | 40  | broadwell| 
- 
-Likewise for the ''bigmem'' partition: 
- 
-^ Memory [MiB]          ^ No. of Nodes ((if all nodes are functional))       ^ Type ^ 
-|354000                | 32 | skylake | 
-|498000                | 20 | broadwell | 
-|1002000               | 2 | broadwell | 
-|1516000               | 2 | skylake | 
- 
- 
-====== Hidden Partitions ====== 
- 
-Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are 'private' to certain project / groups and of interest to these groups, only. 
- 
-To visualize all jobs for a user in all partitions supply the ''-a'' flag: 
- 
-<code bash> 
-$ squeue -u $USER -a 
-</code> 
- 
-Likewise ''sinfo'' can be supplemented with ''-a'' to gather informations. All other commands work without this flag as expected.