Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
checkpoint_restart [2017/07/04 05:31] meesters created |
— (current) | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Checkpointing & Restarting Jobs ====== | ||
- | |||
- | ===== Motivation & Introduction ===== | ||
- | |||
- | Introducing wall times is one measure to ensure balanced distribution of resources on every HPC cluster. Yet, some applications need to have extremely long run times. The solution is [[https:// | ||
- | |||
- | <WRAP center round info 95%> | ||
- | We want to provide integrated checkpointing with slurm, eventually. Until then only third party tools are offered without additional documentation from our part. | ||
- | </ | ||
- | |||
- | ===== Third party tools ===== | ||
- | |||
- | ==== Checkpointing multithreaded applications with dmtcp ==== | ||
- | |||
- | [[http:// | ||
- | |||
- | We provide at least one module for dmtcp, check: | ||
- | <code bash> | ||
- | tools/ | ||
- | </ | ||
- | |||
- | |||
- | |||
- | |||