User Tools

Site Tools


slurm_manage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
slurm_manage [2018/05/28 22:09]
meesters [Information on Jobs]
slurm_manage [2019/05/06 10:17]
meesters [Pending Reasons]
Line 43: Line 43:
 ====== Pending Reasons ====== ====== Pending Reasons ======
  
-So, why do my jobs not start? SLURM may list a number of reasons for pending jobs (those labelled ''​PD'',​ when ''​squeue''​ is triggered).+So, why do my jobs not start? SLURM may list a number of reasons for pending jobs (those labelled ''​PD'',​ when ''​squeue''​ is triggered). ​Here, we show some more frequent reasons:
  
 ^ Reason ^ Brief Explanation ^ ^ Reason ^ Brief Explanation ^
Line 49: Line 49:
 | ''​AssocGrpCPURunMinutesLimit''​ | Indicates, that the partitions associated quality of service in terms of CPU time is exhausted for the [[accounts|account / association in question]] is exhausted. This number will recover. |  | ''​AssocGrpCPURunMinutesLimit''​ | Indicates, that the partitions associated quality of service in terms of CPU time is exhausted for the [[accounts|account / association in question]] is exhausted. This number will recover. | 
 | ''​QOSMaxJobsPerUserLimit''​ | For certain partitions the number of running jobs per user is limited. | | ''​QOSMaxJobsPerUserLimit''​ | For certain partitions the number of running jobs per user is limited. |
-| ''​QOSMaxJobsPerAccountLimit''​ | For certain partitions the number of running jobs per account is limited. | +| ''​QOSMaxJobsPerAccountLimit''​ | For certain partitions the number of running jobs per account is limited. ​
 +| ''​QOSGrpGRESRunMinutes''​ | For certain partitions the generic resources (e.g. GPUs) are limited. See [[gpu|GPU Queues]] ​|
 | ''​QOSGrpMemLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| | ''​QOSGrpMemLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.|
 | ''​QOSGrpCpuLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| | ''​QOSGrpCpuLimit''​ | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.|
 | ''​Resources''​ | while the partition may allow to take the resources you requested, it cannot not -- at the time -- provide the nodes to run on (e.g. because of a memory request which cannot be satisfied).| | ''​Resources''​ | while the partition may allow to take the resources you requested, it cannot not -- at the time -- provide the nodes to run on (e.g. because of a memory request which cannot be satisfied).|
 +| ''​ReqNodeNotAvail''​ | simply means that no node with the required resources is available. SLRUM will list //all// non-available nodes, which can be confusing. This reason is similar to ''​Priority''​ as it means that a specific job has to wait for a resource to be released.|
  
 And then there limitations due to the number of jobs a user or group (a.k.a. account) may run at a given time. More information on partitions can be found [[partitions|on their respective wiki site]]. And then there limitations due to the number of jobs a user or group (a.k.a. account) may run at a given time. More information on partitions can be found [[partitions|on their respective wiki site]].
slurm_manage.txt · Last modified: 2019/10/10 17:18 by meesters