User Tools

Site Tools


data_management:irods

This is an old revision of the document!


Archiving with iRODS

iRODS stands for integrated Rule-Oriented Data System iRODS and is an Open Source Data Management software. It is a kind of a virtual file system, with its own lingo:

  • folders are called collections, they may contain further subcollections
  • files are (data objects).

All commands from the iRODS command line tools start with an 'i'.

On request it is still possible to apply for a TSM account, to store the data on tapes.

What belongs in an archive?

  • (Raw) input data
  • Final publishable or already published data
  • zipped (git) repositories of the scripts/software used to process the data.

Intermediate results and work in progress do not belong in an archive.

If a directory, resp. collection, consists of many small files, those files should be compressed. iRODS works best for files > 5GB.

iRODS Account

There is no need to apply for an iRODS archiving account. Every user of Mogon I/II automatically gets access to iRODS. If your account is associated with a Mogon project you also get read/write access to the iRODS project collection ( /zdv/project/<PROJECT NAME> )

Data persistence

Collections in the home folder of individual user will be deleted once the account gets deleted. Only the /zdv/project/<PROJECT NAME> collections will be archived for an appropriate period (default: 10 years, as suggested by the DFG, Leitlinie 17).

The data saved to the above project collection is owned by the group. This means after a user leaves the project, the data can still be accessed by the other group members, as long as the ACLs are not modified.

Each user has a hidden directory ${HOME}/.irods with the file irods_environment.json in the mogon home directory containing the connection information for the iRODS archive. Below you see the information template for irods_environment.json.

{
    "irods_client_server_negotiation": "request_server_negotiation",
    "irods_client_server_policy": "CS_NEG_REQUIRE",
    "irods_authentication_scheme": "KRB",
    "irods_host": "irods-test-01.zdv.uni-mainz.de",
    "irods_port": 1247,
    "irods_user_name": "<$USER>",
    "irods_zone_name": "zdv",
    "irods_encryption_key_size": 32,
    "irods_encryption_salt_size": 8,
    "irods_encryption_num_hash_rounds": 16,
    "irods_encryption_algorithm": "AES-256-CBC"
}

Authentication is done via kerberos. To get access to the iRODS archive, use the kinit command and enter your password. Please do not use iinit, it will not work and change your `${HOME}/.irods/irods_environment.json` file. In case it happened, remove the folder `${HOME}/.irods` and login again, it will be restored on login.

Security warning

If you initiate the command iinit it might happen that an additional file '.irodsA' is also in the folder '.irods'. Please remove this file! It contains your entered password in a decryptable form.

Commands overview

Here is a short summary over the most important iRODS commands with some important command line parameters.

As mentioned above, iRODS is a kind of a virtual filesystem. The following commands can be used to for navigation.

Command Parameters Description
ipwd print current iRODS working directory (colection)
ils -l, -L, -A list iRODS collection (-l: with details; -L: more details; -A: ACL)
icd <target> change iRODS collection
imkdir -p <coll> create a new collection (directory; -p: with parents)

Each user gets his/her personal home under /zdv/home/${USER} and access to the associated Mogon I/II projects under /zdv/project/<PROJECT NAME>.

Accessible folders:

  • /zdv/home/${USER} private directory
  • /zdv/trash/${USER} private trash bin
  • /zdv/project/<PROJECT NAME> project directory
  • /zdv/home/public every registered user can read/write/delete

Archiving

Uploading data to the iRODS-Archive is done with the command iput.

Command Parameters Description
iput -k, -r Upload files/folders, (-k: inclusive checksums; -r: recursive)
ichksum -r <obj|coll> Compute and store checksums (-r: recursive)

The checksum is calculated server side and we highly recommend to switch it on immediately on upload. Nevertheless, You can also do it later on with ichksum -K <filename>. It creates a checksum equivalent to the command sha256sum <local filename> | cut -d “ ” -f 1 | xxd -r -p | base64, which you can compare to ensure data integrity. The checksums can be queried with ils -L and ichksum. However, if you don't do it on upload with iput -k, there will be no checksum for the TSM ressource.

As mentioned above, several small files should be bundled. Nevertheless, you can still extract an uploaded tar archive on the server and index all contained files with the command ibun. Please read its man page for further details (man ibun).

Access control: ''ichmod''

ichmod has several mandatory parameters:

Parameter Description
null|read|write|own access right
User|Group to whom
Object|Collection for what

'-r' is a usefull optional parameter for recursive ACL modifications.

Retrieving: ''iget''

Getting data back from the iRODS archive is done via iget.

Parameter Description
-r recursive
-f overwrite local existing files

Example

[user@login01 ~]$ kinit
Password for user@UNI-MAINZ.DE: 
 
[user@login01 ~]$ ipwd
/zdv/home/user
 
[user@login01 ~]$ ils /zdv/home/public
/zdv/home/public:
  hello_world.txt
[user@login01 ~]$ ils -l /zdv/home/public
/zdv/home/public:
  rods              0 replResc;compResc;netappResc           24 2019-08-19.11:00 & hello_world.txt
  rods              1 replResc;compResc;tsmResc           24 2019-08-19.11:01 & hello_world.txt
[user@login01 ~]$ ils -L /zdv/home/public
/zdv/home/public:
  rods              0 replResc;compResc;netappResc           24 2019-08-19.11:00 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/public/hello_world.txt
  rods              1 replResc;compResc;tsmResc           24 2019-08-19.11:01 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/public/hello_world.txt
[user@login01 ~]$ ils -A /zdv/home/public
/zdv/home/public:
        ACL - g:public#zdv:own   
        Inheritance - Disabled
  hello_world.txt
        ACL - public#zdv:read object   rods#zdv:own   
[user@login01 ~]$ ils -LA /zdv/home/public
/zdv/home/public:
        ACL - g:public#zdv:own   
        Inheritance - Disabled
  rods              0 replResc;compResc;netappResc           24 2019-08-19.11:00 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/public/hello_world.txt
        ACL - public#zdv:read object   rods#zdv:own   
  rods              1 replResc;compResc;tsmResc           24 2019-08-19.11:01 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/public/hello_world.txt
        ACL - public#zdv:read object   rods#zdv:own   
 
[user@login01 ~]$ iget /zdv/home/public/hello_world.txt
[user@login01 ~]$ ls -l hello_world.txt 
-rw-r----- 1 user zdv 24 Aug 19 11:18 hello_world.txt
 
[user@login01 ~]$ iput hello_world.txt 
[user@login01 ~]$ ils -L
/zdv/home/user:
  user          0 replResc;compResc;netappResc           24 2019-08-19.11:20 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/user/hello_world.txt
  user          1 replResc;compResc;tsmResc           24 2019-08-19.11:20 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/user/hello_world.txt
 
ichksum hello_world.txt
    hello_world.txt    sha2:XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA=
Total checksum performed = 1, Failed checksum = 0
[user@login01 ~]$ ils -L
/zdv/home/user:
  user          0 replResc;compResc;netappResc           24 2019-08-19.11:20 & hello_world.txt
    sha2:XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA=    generic    /fsapp/iRODS/Vault/home/user/hello_world.txt
  user          1 replResc;compResc;tsmResc           24 2019-08-19.11:20 & hello_world.txt
        generic    /fsapp/iRODS/Vault/home/user/hello_world.txt
 
[user@login01 ~]$  sha256sum hello_world.txt | cut -d " " -f 1 | xxd -r -p | base64
XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA=
 
[user@login01 ~]$ irm -f hello_world.txt 
[user@login01 ~]$ iput -k hello_world.txt 
[user@login01 ~]$ ils -L
/zdv/home/user:
  user          0 replResc;compResc;netappResc           24 2019-08-19.11:27 & hello_world.txt
    sha2:XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA=    generic    /fsapp/iRODS/Vault/home/user/hello_world.txt
  user          1 replResc;compResc;tsmResc           24 2019-08-19.11:28 & hello_world.txt
    sha2:XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA=    generic    /fsapp/iRODS/Vault/home/user/hello_world.txt

Metadata: ''imeta''

Metadata is defined as so called AVU triplets (Attribute, Value, Unit). The first two fields (AV) are mandatory and must not be empty, the unit is optional. AV are defined as VARCHAR(2700) and U as VARCHAR(250), which means they are all text with a maximum size of 2700 and 250 characters, respectively. They might also contain JSON, XML or YAML as text.

Editing

Parameter Description
add|set|rm|ls|cp command, see next table for details (ls|cp do not require the AVU triplet)
-d dataObject |-C collection which object/collection (file/path) should be queried/edited
Attribute Value [Unit] AVU triplet, where the Unit is optional

Command Description:

Command Description
add add a AV(U) triplet
set set a single value
rm remove an AV(U) triplet
ls list existing metadata. If Attribute is given, only metadata of the given attribute
cp copy existing metadata. Needs a target and source (e.g. imeta cp -d source -c target)
An Example

The following command lists the metadata automatically associated with the previously upladed file hello_world.txt:

imeta ls -d hello_world.txt

The output of the query is:

AVUs defined for dataObj hello_world.txt:
attribute: AccessRights
value: closed
units: 
----
attribute: Creator
value: Steinkamp, J.
units: 
----
attribute: Date
value: 1566206896
units: 
----
attribute: ExpiryDate
value: 1882430896
units: 
----
attribute: Location
value: Mainz, Germany
units: 
----
attribute: protected
value: false
units: 
----
attribute: Publisher
value: Johannes Gutenberg-University
units: 
[user@login01 ~]$ 

You can now add a title, which is not created automatically:

imeta set -d hello_world.txt Title "Archive of experimental szstem from '$(date)'"

If you query the Attribute 'Table' with imeta ls -d hello_world.txt Title you get:

AVUs defined for dataObj hello_world.txt:
attribute: Title
value: Archive of experimental szstem from 'Mon Aug 19 11:39:48 CEST 2019'
units: 

Adjusting Meta Data

In the example we deliberately made an error (you might have noticed). You can correct such glitches with the general syntax:

$ imeta mod -d <filename> <attribute> <old value> v:<new value>

or in our example:

imeta mod -d hello_world.txt Title "Archive of experimental szstem from 'Mon Aug 19 11:39:48 CEST 2019'" v:"Archive of experimental system from 'Mon Aug 19 11:39:48 CEST 2019'"

For further details see man imeta.

Minimum set of Attributes

As you could see above, we generate as many metadata attributes as possible automatically, to hopefully simplify your life. Nevertheless, you can adjust and extend them to your needs.

  • Title free text (user input needed)
  • Creator full user name (created automatically)
  • Publisher “Johannes Gutenberg-University” (created automatically)
  • Location “Mainz, Germany” (created automatically)
  • Date Unix timestamp (created automatically)
  • ExpiryDate Date + 10 years (created automatically)
  • Type audio, data set, image, source code, … (user input needed)
  • Format simply the file format (e.g. output from *file* command) (user input needed)
  • AccessRights “closed”, “restricted”, “embargoed”, “open” (default: “closed”)
    • AccessConditions if AccessRights is “resticted” (not yet)
    • EmbargoDate if AccessRights is “embargoed” (not yet)
  • protected (default: “false”)

write protection

There is one attribute, which should be used with caution: protected (which default value is 'false'). If the attribute protected with the value true is set (case sensitive!) or modified to 'true', the user cannot delete/overwrite the object and most of the metadata attributes any more. This is for the case, if data integrity needs to be ensured, that p.e. after a publication the data cannot be changed any more. Nevertheless, additional metadata attributes can still be edited.

If the dataset should be FAIR (Findable, Accessible, Interaperable, Reusable) are also mandatory:

  • AccessRights must not be “closed”
  • Identifier (provided by ZDV/UB, only if attribute “protected” is set; not yet)
  • License The license for reuse. Recommended: GPL for code, CC0 for data sets, otherwise CC-BY
  • Subject any keywords
  • Contributor co-authors
  • Reference publication references
  • Description free test
  • Abstract free text

Further fields can be inserted. This depends on the scientific field and is the responsibility of the respective researcher or group.

Searching

for filenames: ''ilocate''

[user@login01 ~]$ ilocate -t "hello_world.txt" 
/zdv/home/user/hello_world.txt
/zdv/home/public/hello_world.txt

for metadata: ''imeta qu''

You must know, if you want to search for a data object (-d) or a collection (-C). And you can use SQL wildcards (%), if you don't know the exact pattern you are looking for. The wildcard pattern matching is also applicable for ilocate.

[user@login01 ~]$ imeta qu -d Title like "Archive%"
collection: /zdv/home/user
dataObj: hello_world.txt

Publishing

For public access a ticket needs to be created for collections or data objects. For example, if you use the above uploaded hello_world.txt file again.

[user@login01 ~]$ iticket create read hello_world.txt
ticket:ACR2RKDyuZMBRmb

With this ticket and the path everybody can query information and the content of collections and data objects vi a provided REST-API. JSON strings are returned for valid URLs.

General information about data objects

[user@login01 ~]$ curl https://irods-test.zdv.uni-mainz.de/irods-rest/rest/dataObject/zdv/home/jsteinka/hello_world.txt?ticket=ACR2RKDyuZMBRmb
{"id":1808764,
 "collectionId":24346,
 "dataName":"hello_world.txt",
 "collectionName":"/zdv/home/jsteinka",
 "dataReplicationNumber":0,
 "dataVersion":0,
 "dataTypeName":"generic",
 "dataSize":24,
 "resourceGroupName":"",
 "resourceName":"netappResc",
 "dataPath":"/fsapp/iRODS/Vault/home/jsteinka/hello_world.txt",
 "dataOwnerName":"jsteinka",
 "dataOwnerZone":"zdv",
 "replicationStatus":"1",
 "dataStatus":"",
 "checksum":"sha2:XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA=",
 "expiry":"00000000000",
 "dataMapId":0,
 "comments":"",
 "createdAt":1566206868000,
 "updatedAt":1566206868000,
 "specColType":"NORMAL",
 "objectPath":""
}

Querying the metadata

[user@login01 ~]$ curl https://irods-test.zdv.uni-mainz.de/irods-rest/rest/dataObject/zdv/home/jsteinka/hello_world.txt/metadata?ticket=ACR2RKDyuZMBRmb
{"metadataEntries": [
    {"count":1,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"AccessRights",
     "value":"closed",
     "unit":""},
    {"count":2,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"Publisher",
     "value":"Johannes Gutenberg-University",
     "unit":""},
    {"count":3,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"Location",
     "value":"Mainz, Germany",
     "unit":""},
    {"count":4,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"protected",
     "value":"false",
     "unit":""},
    {"count":5,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"Creator",
     "value":"Steinkamp, J.",
     "unit":""},
    {"count":6,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"Date",
     "value":"1566206896",
     "unit":""},
    {"count":7,
     "lastResult":true,"
     totalRecords":0,
     "attribute":"ExpiryDate",
     "value":"1882430896",
     "unit":""},
    {"count":8,
     "lastResult":true,
     "totalRecords":0,
     "attribute":"Title",
     "value":"Archive of experimental system from 'Mon Aug 19 11:39:48 CEST 2019'",
     "unit":""}],
"objectType":"DATA_OBJECT","uniqueNameString":"/zdv/home/jsteinka/hello_world.txt"}

Downloading data

The file content can be viewed with curl or downloaded with wget.

curl https://irods-test-01.zdv.uni-mainz.de/irods-rest/rest/fileContents/zdv/home/jsteinka/hello_world.txt?ticket=ACR2RKDyuZMBRmb

Retrieve the metadata of a collection

curl https://irods-test.zdv.uni-mainz.de/irods-rest/rest/collection/zdv/home/public/helloCollection?ticket=mbyAAFGm7vhUdyM
{
 "collectionId":1808781,
 "collectionName":"/zdv/home/public/helloCollection",
 "objectPath":"",
 "collectionParentName":"/zdv/home/public/",
 "collectionOwnerName":"rods",
 "collectionOwnerZone":"zdv",
 "collectionMapId":"0",
 "collectionInheritance":"",
 "comments":"",
 "info1":"",
 "info2":"",
 "createdAt":1566213731000,
 "modifiedAt":1566213731000,
 "specColType":"NORMAL",
 "children":[]
}

REST-API URL

the URL for the REST-API consists of:

For further information, please read the original IRODS-REST documentation

Data Policy/Recommendation

The “Creator” is the responsible person in the sense of the Urheberrechtsgesetz, taking care that reusing of third party data is legal and in the sense of the DSGVO, that personal data is handled correctly. Even if the “Creator” is not employed at the university any more.

There exists a decision guide if data can be published, sadly only in german.

Licensing

Different kinds of Licenses exist for various cases, this here is just an incomplete list of the most common Open Access Licenses for the three most common datatypes:

The applicability of CC-BY licenses for datasets is doubtful. Other licenses search at Open Definition Licenses Service

Proprietary file formats should be avoided, since you don't know if the software to open them still exists in a few years. Try to stick to open standards.

Further Documentation

There are a lot more commands. You can look them up in the original documentation: iRODS documentation

other wikis:

data_management/irods.1566292903.txt.gz · Last modified: 2019/08/20 11:21 by jsteinka