Login-Nodes
Calculations are to be submitted as jobs. This means jobs are not to be run on login-nodes. Processes running directly on these nodes should be limited to tasks such as editing, data transfer and management, data analysis, compiling codes and debugging, as long as it is not resource intensive (memory, cpu, network and/or i/o). Any resource intensive work must be run on the compute nodes through the batch system. In order to give everyone a fair share of the login nodes resources, we have implemented limits:
Resource limits on login nodes
Resource | limit |
---|---|
Memory | 10 GB |
CPU cores | 4 |
Therefore: Any process that is consuming extensive resources on a login-node may be killed, especially when it begins to impact other users on that node. If a process is creating significant problems in the system, the process will be killed immediately and the user will be contacted via email.
SLURM-commands are resource intensive, if not on login nodes, they are the user interaction tool with the scheduling system. Please avoid placing them in loops, like
for i in `seq ...`; do sbatch ... cmd $i done
This could be replaced by a job array.
watch squeue
to act upon a job status is causing strain on the scheduler1). If it is about checking a job status for subsequent work, this could be implemented using job dependencies.
Also, the scheduler needs time to work: The default time between watch
cycles is below the update frequency of the scheduler.
There are always better solutions, do not hesitate asking us.