User Tools

Site Tools


checkpoint_restart

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
checkpoint_restart [2017/07/04 05:31]
meesters created
checkpoint_restart [2017/09/18 11:34]
meesters [Checkpointing & Restarting Jobs]
Line 1: Line 1:
 ====== Checkpointing & Restarting Jobs ====== ====== Checkpointing & Restarting Jobs ======
 +
 +<WRAP center round todo 90%>
 +''​This feature is experimental. We hope to provide more information,​ soon.''​
 +</​WRAP>​
  
 ===== Motivation & Introduction ===== ===== Motivation & Introduction =====
  
-Introducing wall times is one measure to ensure balanced distribution of resources on every HPC cluster. Yet, some applications need to have extremely long run times. The solution is [[https://​en.wikipedia.org/​wiki/​Application_checkpointing|Application Checkpointing]],​ where a snapshot of the running application is saved in pre-defined intervals.+Introducing wall times is one measure to ensure balanced distribution of resources on every HPC cluster. Yet, some applications need to have extremely long run times. The solution is [[https://​en.wikipedia.org/​wiki/​Application_checkpointing|Application Checkpointing]],​ where a snapshot of the running application is saved in pre-defined intervals. This provides the ability to restart an application from the point on, where the checkpoint has been saved.
  
 <WRAP center round info 95%> <WRAP center round info 95%>
checkpoint_restart.txt · Last modified: 2017/09/18 11:34 by meesters