Input and Output on HPC Systems

This page comments on general aspects related to workloads on Mogon I/II, more information with respect to filesystems can be found here.

Scientific applications perform I/O to parallel file system in primarily one of two ways:

  • Shared-­file (N-­to-­1): A single file is created, and all application tasks write to that file (usually to completely disjoint regions)
    • This increases usability: There is only one file to keep track of by the application
    • It may create lock contention and hinder performance
  • File-­per‐process (N‐to‐N): Each instance of an application / each task creates or reads a separate file and writes to that file, only.
    • It may avoid lock contention on the application level, but increases the risk of file system stress when to writing to one destination, thereby triggering locking on the file system level
    • It is impossible to restart these applications with a different number of tasks

Currently, when suspecting I/O problems you should address the HPC-team. There is no straight forward method available to analyze I/O problems on the user level (of third party applications).

We may provide more tools in the foreseeable future.

The statements above may seem a little abstract, particularly when third-party applications have to be used and no decision can be made about the application architecture.

However, a few rules of thumb can be given:

  • Pooling short jobs is generally a good idea with respect to scheduling and organizing your work flow. If this involves reading identical input files by all those application instances, the stage-in to a node local scratch or even into RAM may solve performance issues: By creating a temporary input resource, the need to keep track of file accesses for these particular files for the global file system is dropped.
  • Avoid keeping open file handles by many processes (within a directory). Violating this rule may cause delays, because the global file system needs to coordinate every writing process. A possible solution is to write into the job directory (see node local scratch) and to copy this output to the global file system after a writing process is finished / releases a file.
  • Avoid writing too many small files: The overhead in keeping track of the meta information for millions of small files can be bigger than the file size. The global file system is not optimized for this.
  • io_odds_and_ends.txt
  • Last modified: 2017/11/29 19:35
  • (external edit)