start:working_on_mogon:slurm_manage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
start:working_on_mogon:slurm_manage [2020/07/01 17:35]
meesters [Pending Reasons]
start:working_on_mogon:slurm_manage [2021/12/17 14:41] (current)
jrutte02 [Modifying Pending Jobs]
Line 40: Line 40:
  
 For more information see ''man scontrol''. For more information see ''man scontrol''.
 +
 +====== Job State Codes ======
 +
 +
 +^ Status ^ Code ^ Description^
 +| COMPLETED | ''CD'' | The Job has completed successfully. |
 +| COMPLETING | ''CG'' | The job is finishing but some processes are still active.|
 +| FAILED | ''F'' | The job terminated with a non-zero exit code and failed to execute.|
 +| PENDING | ''PD'' | The job is waiting for resource allocation. It will eventually run.|
 +| PREEMPTED | ''PR'' | The job was terminated because of preemption by another job.|
 +| RUNNING | ''R'' | The job currently is allocated to a node and is running.|
 +| SUSPENDED | ''S'' | A running job has been stopped with its cores released to other jobs.|
 +| STOPPED | ''ST'' | A running job has been stopped with its cores retained.|
  
 ====== Pending Reasons ====== ====== Pending Reasons ======
Line 52: Line 65:
 | ''QOSMaxJobsPerAccountLimit'' | For certain partitions the number of running jobs per account is limited. | | ''QOSMaxJobsPerAccountLimit'' | For certain partitions the number of running jobs per account is limited. |
 | ''QOSGrpGRESRunMinutes'' | For certain partitions the generic resources (e.g. GPUs) are limited. See [[:start:working_on_mogon:partitions#gpu_queues|GPU Queues]] | | ''QOSGrpGRESRunMinutes'' | For certain partitions the generic resources (e.g. GPUs) are limited. See [[:start:working_on_mogon:partitions#gpu_queues|GPU Queues]] |
-| ''QOSGrpMemLimit''the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| +| ''QOSGrpMemLimit''The requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| 
-| ''QOSGrpCpuLimit'' | the requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| +| ''QOSMinMemory''   The Job isn't requesting enough Memory for the requested Partition. | 
-| ''Resources''while the partition may allow to take the resources you requested, it cannot not -- at the time -- provide the nodes to run on (e.g. because of a memory request which cannot be satisfied).| +| ''QOSGrpCpuLimit'' | The requested partition is limited in the fraction of resources it can take from the cluster and this amount has been reached: jobs need to end, before new may start.| 
-| ''ReqNodeNotAvail''simply means that no node with the required resources is available. SLRUM will list //all// non-available nodes, which can be confusing. This reason is similar to ''Priority'' as it means that a specific job has to wait for a resource to be released.|+| ''Resources''The job is eligible to run but resources aren't available at this time. This usually just means that your job will start next once nodes are done with their current jobs.| 
 +| ''ReqNodeNotAvail''Simply means that no node with the required resources is available. SLURM will list //all// non-available nodes, which can be confusing. This reason is similar to ''Resources'' as it means that a specific job has to wait for a resource to be released.|
  
 And then there limitations due to the number of jobs a group (a.k.a. account) may run at a given time. More information on partitions can be found [[:start:working_on_mogon:partitions|on their respective wiki site]]. And then there limitations due to the number of jobs a group (a.k.a. account) may run at a given time. More information on partitions can be found [[:start:working_on_mogon:partitions|on their respective wiki site]].
  • start/working_on_mogon/slurm_manage.1593617721.txt.gz
  • Last modified: 2020/07/01 17:35
  • by meesters