training_and_outreach:ticket_system

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
training_and_outreach:ticket_system [2019/04/01 10:59]
meesters created
training_and_outreach:ticket_system [2019/04/01 11:06]
meesters
Line 9: Line 9:
 If you write an eMail, please try to include **as much** information as possible, **including**, but not limited to: If you write an eMail, please try to include **as much** information as possible, **including**, but not limited to:
  
-  * The //job command line// or the //job script// (''sbatch ...'')+  * The //job command line// or the //job script// (the one invoked with ''sbatch ...'')
   * JobIDs   * JobIDs
   * The //environment// you started the job in (at least the output of ''module list'', maybe even ''env | sort'')   * The //environment// you started the job in (at least the output of ''module list'', maybe even ''env | sort'')
   * If possible, the whole job //output//, or if it is to big, the relevant output from the batch system (at the beginning and the end) and any //error messages// you encounter.   * If possible, the whole job //output//, or if it is to big, the relevant output from the batch system (at the beginning and the end) and any //error messages// you encounter.
 +
 +===== Investigatin a running Job =====
 +
 +Sometimes particular jobs give issues. Expecting us to investigate a running job, requires a job to be running at working hours and to be tracked at will. This can be accomplished, if you submit your job with the ''--hold'' flag, by amending it on the command line:
 +
 +<code bash>
 +$ sbatch --hold ...
 +</code>
 +
 +If you subsequently notify us with a mail to [[hpc@uni-mainz.de]] we can release the job at any time and investigate it. Be sure to give the path to the expected job output, too.
 +
 +<WRAP center round info 90%>
 +If you suspect a particular node giving issues (perhaps there is a hardware problem) and we shall investigate this((we do have an automated failure detection in place, such system have their weak spots)), you can submit with:
 +<code bash>
 +$ sbatch --hold -w <nodename> ...
 +</code>
 +</WRAP>
 +
 +Thank you for your cooperation!
 +