User Tools

Site Tools


slurm_manage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
slurm_manage [2018/02/25 07:03]
meesters [Software Errors]
slurm_manage [2019/05/06 10:17] (current)
meesters [Pending Reasons]
Line 17: Line 17:
 </​WRAP>​ </​WRAP>​
  
-<WRAP center round important 80%> 
-On Mogon I, pending a slurm-update,​ ''​seff''​ is available as 
- 
-''​%%$ /​cluster/​system/​tools/​seff <​jobid>​%%''​ 
- 
-</​WRAP>​ 
  
  
Line 49: Line 43:
 ====== Pending Reasons ====== ====== Pending Reasons ======
  
-So, why do my jobs not start? SLURM may list a number of reasons for pending jobs (those labelled ''​PD'',​ when ''​squeue''​ is triggered).+So, why do my jobs not start? SLURM may list a number of reasons for pending jobs (those labelled ''​PD'',​ when ''​squeue''​ is triggered). ​Here, we show some more frequent reasons:
  
 ^ Reason ^ Brief Explanation ^ ^ Reason ^ Brief Explanation ^
Line 55: Line 49:
 | ''​AssocGrpCPURunMinutesLimit''​ | Indicates, that the partitions associated quality of service in terms of CPU time is exhausted for the [[accounts|account / association in question]] is exhausted. This number will recover. |  | ''​AssocGrpCPURunMinutesLimit''​ | Indicates, that the partitions associated quality of service in terms of CPU time is exhausted for the [[accounts|account / association in question]] is exhausted. This number will recover. | 
 | ''​QOSMaxJobsPerUserLimit''​ | For certain partitions the number of running jobs per user is limited. | | ''​QOSMaxJobsPerUserLimit''​ | For certain partitions the number of running jobs per user is limited. |
-| ''​QOSMaxJobsPerAccountLimit''​ | For certain partitions the number of running jobs per account is limited. | +| ''​QOSMaxJobsPerAccountLimit''​ | For certain partitions the number of running jobs per account is limited. ​
 +| ''​QOSGrpGRESRunMinutes''​ | For certain partitions the generic resources (e.g. GPUs) are limited. See [[gpu|GPU Queues]] ​|
 | ''​QOSGrpMemLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| | ''​QOSGrpMemLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.|
 +| ''​QOSGrpCpuLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.|
 | ''​Resources''​ | while the partition may allow to take the resources you requested, it cannot not -- at the time -- provide the nodes to run on (e.g. because of a memory request which cannot be satisfied).| | ''​Resources''​ | while the partition may allow to take the resources you requested, it cannot not -- at the time -- provide the nodes to run on (e.g. because of a memory request which cannot be satisfied).|
 +| ''​ReqNodeNotAvail''​ | simply means that no node with the required resources is available. SLRUM will list //all// non-available nodes, which can be confusing. This reason is similar to ''​Priority''​ as it means that a specific job has to wait for a resource to be released.|
  
 And then there limitations due to the number of jobs a user or group (a.k.a. account) may run at a given time. More information on partitions can be found [[partitions|on their respective wiki site]]. And then there limitations due to the number of jobs a user or group (a.k.a. account) may run at a given time. More information on partitions can be found [[partitions|on their respective wiki site]].
slurm_manage.1519538584.txt.gz · Last modified: 2018/02/25 07:03 by meesters