Archiving

Meta Data Stewardship with Schemas

In order to facilitate populating iRODS collections with meta data, according to schemas we provide a helper module.

You can create a schema file with an online tool:

JSON-Schemas to iRODS

Loading the module tools/imcs will provide a script which can be called like:

schema2avu -j <json_file> -c <iRODS-path to iRODS collection>
Currently no nested schemas for complex data are supported. As this nesting might be data specific, you may approach the HPC team to include the necessary feature for your specific data.

Preparing to archive

We suggest to compress and annotate data prior to archiving with the iRODS archive:

  • compressing saves transfer time
  • annotation eases the interpretation of retrieved data (if an archive needs to be pulled back).

Compressing Directories

A smaller directory can be compressed in the standard way:

# assuming gzip compression
tar -czf <archivename>.tar.gz <directoryname>

You may speed-up the compression, on a login-node using a parallel compression tool like pigz:

module load tools/pigz
tar cf - <directoryname> | pigz -p 4 > <archivename>.tar.gz

If the directory you are working on is too big, you can run an interactive job, too:

module load tools/pigz
# an interactive job might look like:
srun -A <your account> -p parallel -C broadwell -t <appropriate time> -N 1 -c40 --pty bash -i
  <some node>:$ tar -I pigz -cf <archivename>.tar.gz <directoryname>