User Tools

Site Tools


data_management:irods

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
data_management:irods [2019/08/19 14:10]
jsteinka [Publishing]
data_management:irods [2020/01/30 12:36] (current)
jsteinka [Metadata: ''imeta'']
Line 1: Line 1:
 +<WRAP center round important 60%>
 +Please note, that this service is still in its beta-phase. Do not hesitate to [[training_and_outreach:​ticket_system|contact us]] in order to discuss your data management plan.
 +</​WRAP>​
 +
 ====== Archiving with iRODS ====== ====== Archiving with iRODS ======
  
Line 5: Line 9:
   *  files are (//data objects//).   *  files are (//data objects//).
  
-All commands from the iRODS command line tools start with an '//​i//'​.+All commands from the iRODS command line tools start with an '//​i//' ​and they are installed on the mogon login nodes. 
 + 
 +iRODS uses so called '//​Resources//'​ to archive the //​collections//​ and //data objects//. The //​resources//​ are organized hierarchically. The root is a //​replication resource//, where other //​resources//​ are added as children. Currently there is a //compound resource// consisting itself of a cache (unix filesystem) and a universal mass storage system (here: TSM) as archive. The cache has a size of 8TB, once it is fills up, the oldest //data objects// will be deleted on the cache. If required, they will be fetched back from the archive. 
 + 
 +<​code>​ 
 +replResc:​replication 
 +├── cephfsResc:​unixfilesystem 
 +└── compResc:​compound 
 +    ├── netappResc:​unixfilesystem 
 +    └── tsmResc:​univmss 
 +</​code>​
  
 On request it is still possible to apply for a [[archiving:​tsm|TSM account]], to store the data on tapes. On request it is still possible to apply for a [[archiving:​tsm|TSM account]], to store the data on tapes.
Line 18: Line 32:
  
 If a directory, resp. //​collection//,​ consists of many small files, those files should be [[archiving:​preparation|compressed]]. iRODS works best for files > 5GB. If a directory, resp. //​collection//,​ consists of many small files, those files should be [[archiving:​preparation|compressed]]. iRODS works best for files > 5GB.
 +{{ :​data_management:​test-large-files.png?​direct&​400 | iRODS upload benchmark with one file of varying size.}} 
 +{{ :​data_management:​test-small-files.png?​direct&​400 | iRODS upload benchmark with multiple tiny files.}}
 ===== iRODS Account ===== ===== iRODS Account =====
  
Line 27: Line 42:
  
 Collections in the home folder of individual user will be deleted once the account gets deleted. Only the // /​zdv/​project/<​PROJECT NAME> // collections will be archived for an appropriate period (default: 10 years, as suggested by the [[https://​www.dfg.de/​foerderung/​grundlagen_rahmenbedingungen/​gwp/​|DFG,​ Leitlinie 17]]). Collections in the home folder of individual user will be deleted once the account gets deleted. Only the // /​zdv/​project/<​PROJECT NAME> // collections will be archived for an appropriate period (default: 10 years, as suggested by the [[https://​www.dfg.de/​foerderung/​grundlagen_rahmenbedingungen/​gwp/​|DFG,​ Leitlinie 17]]).
 +
 +The data saved to the above project collection is owned by the group. This means after a user leaves the project, the data can still be accessed by the other group members, as long as the ACLs are not modified.
 </​WRAP>​ </​WRAP>​
  
Line 51: Line 68:
 **Security warning** **Security warning**
  
-If you initiate the command ''​iinit''​ it might happen that an additional file '//​.irodsA//'​ is also in the folder '//​.irods//'​. Please remove this file! It contains your entered ​password in a decryptable form.+If you initiate the command ''​iinit''​ it might happen that an additional file '//​.irodsA//'​ is also in the folder '//​.irods//'​. Please remove this file! It contains your password in a decryptable form.
 </​WRAP>​ </​WRAP>​
  
Line 72: Line 89:
 Accessible folders: Accessible folders:
   * ''/​zdv/​home/​${USER}''​ private directory   * ''/​zdv/​home/​${USER}''​ private directory
-  * ''/​zdv/​trash/​${USER}''​ private trash bin 
   * ''/​zdv/​project/<​PROJECT NAME>''​ project directory   * ''/​zdv/​project/<​PROJECT NAME>''​ project directory
   * ''/​zdv/​home/​public''​ every registered user can read/​write/​delete   * ''/​zdv/​home/​public''​ every registered user can read/​write/​delete
 +  * ''/​zdv/​trash/​home/​${USER}''​ private trash bin
 +
 +
 +<WRAP center round info 80%>
 +Only the project directory are meant for permanent archiving. We are working on a solution to prevent archiving to the **volatile** iRODS homes.
 +</​WRAP>​
  
 ==== Archiving ==== ==== Archiving ====
Line 108: Line 130:
  
 === Example === === Example ===
-<code bash>+1. Get your kerberos ticket and print some information about your iRODS account 
 +{{ :​data_management:​irods_01_kinit.png?​600 |}}
  
-[user@login01 ~]$ kinit +2Navigation 
-Password for user@UNI-MAINZ.DE+{{ :data_management:​irods_02_navigation.png?​600 |}}
  
-[user@login01 ~]$ ipwd +3. Archiving, retrieving files and simple information 
-/​zdv/​home/​user+{{ :​data_management:​irods_03_archiving_files.png?​600 |}}
  
-[user@login01 ~]$ ils /​zdv/​home/​public +4Checksum 
-/​zdv/​home/​public:​ +{{ :data_management:irods_04_checksum.png?800 |}}
-  hello_world.txt +
-[user@login01 ~]$ ils -l /​zdv/​home/​public +
-/​zdv/​home/​public: +
-  rods              0 replResc;​compResc;​netappResc ​          24 2019-08-19.11:00 & hello_world.txt +
-  rods              1 replResc;​compResc;​tsmResc ​          24 2019-08-19.11:​01 & hello_world.txt +
-[user@login01 ~]$ ils -L /​zdv/​home/​public +
-/​zdv/​home/​public:​ +
-  rods              0 replResc;​compResc;​netappResc ​          24 2019-08-19.11:​00 & hello_world.txt +
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​public/​hello_world.txt +
-  rods              1 replResc;​compResc;​tsmResc ​          24 2019-08-19.11:​01 & hello_world.txt +
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​public/​hello_world.txt +
-[user@login01 ~]$ ils -A /​zdv/​home/​public +
-/​zdv/​home/​public:​ +
-        ACL - g:​public#​zdv:​own ​   +
-        Inheritance - Disabled +
-  hello_world.txt +
-        ACL - public#​zdv:​read object ​  ​rods#​zdv:​own ​   +
-[user@login01 ~]$ ils -LA /​zdv/​home/​public +
-/​zdv/​home/​public:​ +
-        ACL - g:​public#​zdv:​own ​   +
-        Inheritance - Disabled +
-  rods              0 replResc;​compResc;​netappResc ​          24 2019-08-19.11:​00 & hello_world.txt +
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​public/​hello_world.txt +
-        ACL - public#​zdv:​read object ​  ​rods#​zdv:​own ​   +
-  rods              1 replResc;​compResc;​tsmResc ​          24 2019-08-19.11:​01 & hello_world.txt +
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​public/​hello_world.txt +
-        ACL - public#​zdv:​read object ​  ​rods#​zdv:​own ​  +
  
-[user@login01 ~]$ iget /​zdv/​home/​public/​hello_world.txt 
-[user@login01 ~]$ ls -l hello_world.txt ​ 
--rw-r----- 1 user zdv 24 Aug 19 11:18 hello_world.txt 
- 
-[user@login01 ~]$ iput hello_world.txt ​ 
-[user@login01 ~]$ ils -L 
-/​zdv/​home/​user:​ 
-  user          0 replResc;​compResc;​netappResc ​          24 2019-08-19.11:​20 & hello_world.txt 
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​user/​hello_world.txt 
-  user          1 replResc;​compResc;​tsmResc ​          24 2019-08-19.11:​20 & hello_world.txt 
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​user/​hello_world.txt 
- 
-ichksum hello_world.txt 
-    hello_world.txt ​   sha2:​XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA= 
-Total checksum performed = 1, Failed checksum = 0 
-[user@login01 ~]$ ils -L 
-/​zdv/​home/​user:​ 
-  user          0 replResc;​compResc;​netappResc ​          24 2019-08-19.11:​20 & hello_world.txt 
-    sha2:​XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA= ​   generic ​   /​fsapp/​iRODS/​Vault/​home/​user/​hello_world.txt 
-  user          1 replResc;​compResc;​tsmResc ​          24 2019-08-19.11:​20 & hello_world.txt 
-        generic ​   /​fsapp/​iRODS/​Vault/​home/​user/​hello_world.txt 
- 
-[user@login01 ~]$  sha256sum hello_world.txt | cut -d " " -f 1 | xxd -r -p | base64 
-XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA= 
- 
-[user@login01 ~]$ irm -f hello_world.txt ​ 
-[user@login01 ~]$ iput -k hello_world.txt ​ 
-[user@login01 ~]$ ils -L 
-/​zdv/​home/​user:​ 
-  user          0 replResc;​compResc;​netappResc ​          24 2019-08-19.11:​27 & hello_world.txt 
-    sha2:​XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA= ​   generic ​   /​fsapp/​iRODS/​Vault/​home/​user/​hello_world.txt 
-  user          1 replResc;​compResc;​tsmResc ​          24 2019-08-19.11:​28 & hello_world.txt 
-    sha2:​XPdR4XQP49lWUGEfPJz0Jo+kmkndGxz6rCQUzCqHteA= ​   generic ​   /​fsapp/​iRODS/​Vault/​home/​user/​hello_world.txt 
-</​code>​ 
 ==== Metadata: ''​imeta''​ ==== ==== Metadata: ''​imeta''​ ====
  
Line 275: Line 237:
 As you could see above, we generate as many metadata attributes as possible automatically,​ to hopefully simplify your life. Nevertheless,​ you can adjust and extend them to your needs. As you could see above, we generate as many metadata attributes as possible automatically,​ to hopefully simplify your life. Nevertheless,​ you can adjust and extend them to your needs.
  
- +  ​Set automatically:​ 
-  ​**Title** free text (//user input needed//) +    * **Creator** full user name 
-  * **Creator** full user name (//created automatically//​) +    * **Publisher** "​Johannes Gutenberg-University"​ 
-  * **Publisher** "​Johannes Gutenberg-University" ​(//created automatically//​) +    * **Location** "​Mainz,​ Germany"​ 
-  * **Location** "​Mainz,​ Germany" ​(//created automatically//​) +    * **Date** Unix timestamp 
-  * **Date** Unix timestamp ​(//created automatically//​) +    * **ExpiryDate** Date + 10 years 
-  * **ExpiryDate** Date + 10 years (//created automatically//) +    * **protected** ​(//default: "​false"​//) 
-  * **Type** audio, data set, image, source code, ... (//user input needed//) +  ​* User input required: 
-  * **Format** simply the file format (e.g. output from *file* command) (//user input needed//+    * **Title** free text 
-  * **AccessRights** "​closed",​ "​restricted",​ "​embargoed",​ "​open"​ (//default: "​closed"//​) +    * **Description** text 
-    * **AccessConditions** if AccessRights is "​resticted"​ (not yet) +    ​* **Type** audio, data set, image, source code, ... 
-    * **EmbargoDate** if AccessRights is "​embargoed"​ (not yet+    * **Format** simply the file format (e.g. output from *file* command) 
-  * **protected** (//default: "​false"//​)+    * **AccessRights** "​closed",​ "​restricted",​ "​embargoed",​ "​open"​ (//default: "​closed"//​) 
 +      * **AccessConditions** if AccessRights is "​resticted"​ (not yet) 
 +      * **EmbargoDate** if AccessRights is "​embargoed"​ (not yet)
  
 <WRAP center round info 80%> <WRAP center round info 80%>
Line 299: Line 263:
   * **Identifier** (provided by ZDV/UB, only if attribute "​protected"​ is set; not yet)   * **Identifier** (provided by ZDV/UB, only if attribute "​protected"​ is set; not yet)
   * **License** The license for reuse. Recommended:​ GPL for code, CC0 for data sets, otherwise CC-BY   * **License** The license for reuse. Recommended:​ GPL for code, CC0 for data sets, otherwise CC-BY
-  * **Subject** any keywords+  * **Keywords** any keywords
  
 === Additional Recommended Attributes === === Additional Recommended Attributes ===
Line 305: Line 269:
   * **Contributor** co-authors   * **Contributor** co-authors
   * **Reference** publication references   * **Reference** publication references
-  * **Description** free test 
-  * **Abstract** free text 
  
  
Line 315: Line 277:
     * [[https://​www.ddialliance.org/​Data Documentation Initiative]]     * [[https://​www.ddialliance.org/​Data Documentation Initiative]]
     * [[https://​www.radar-service.eu/​radar-schema|RADAR]]     * [[https://​www.radar-service.eu/​radar-schema|RADAR]]
 +    * [[https://​schema.org/​|schema.org]]
   * subject specific   * subject specific
     * [[http://​www.dcc.ac.uk/​resources/​metadata-standards|Digital Curation Centre]]     * [[http://​www.dcc.ac.uk/​resources/​metadata-standards|Digital Curation Centre]]
Line 338: Line 301:
 dataObj: hello_world.txt dataObj: hello_world.txt
 </​code>​ </​code>​
 +
 +=== via database query: ''​iquest''​ ===
 +
 +For this complex syntax consult the [[https://​docs.irods.org/​4.2.6/​icommands/​user/#​iquest|Online help]].
  
 ==== Publishing ==== ==== Publishing ====
Line 353: Line 320:
  
 <code bash> <code bash>
-[user@login01 ~]$ curl https://​irods-test.zdv.uni-mainz.de/​irods-rest/​rest/​dataObject/​zdv/​home/​jsteinka/​hello_world.txt?​ticket=ACR2RKDyuZMBRmb+[user@login01 ~]$ curl https://​irods-web.zdv.uni-mainz.de/​irods-rest/​rest/​dataObject/​zdv/​home/​jsteinka/​hello_world.txt?​ticket=ACR2RKDyuZMBRmb
 </​code>​ </​code>​
 <code JavaScript>​ <code JavaScript>​
Line 385: Line 352:
  
 <code bash> <code bash>
-[user@login01 ~]$ curl https://​irods-test.zdv.uni-mainz.de/​irods-rest/​rest/​dataObject/​zdv/​home/​jsteinka/​hello_world.txt/​metadata?​ticket=ACR2RKDyuZMBRmb+[user@login01 ~]$ curl https://​irods-web.zdv.uni-mainz.de/​irods-rest/​rest/​dataObject/​zdv/​home/​jsteinka/​hello_world.txt/​metadata?​ticket=ACR2RKDyuZMBRmb
 </​code>​ </​code>​
 <code JavaScript>​ <code JavaScript>​
Line 443: Line 410:
 The file content can be viewed with ''​curl''​ or downloaded with ''​wget''​. The file content can be viewed with ''​curl''​ or downloaded with ''​wget''​.
 <code bash> <code bash>
-curl https://​irods-test-01.zdv.uni-mainz.de/​irods-rest/​rest/​fileContents/​zdv/​home/​jsteinka/​hello_world.txt?​ticket=ACR2RKDyuZMBRmb+curl https://​irods-web.zdv.uni-mainz.de/​irods-rest/​rest/​fileContents/​zdv/​home/​jsteinka/​hello_world.txt?​ticket=ACR2RKDyuZMBRmb
 </​code>​ </​code>​
  
 === Retrieve the metadata of a collection === === Retrieve the metadata of a collection ===
 <code bash> <code bash>
-curl https://​irods-test.zdv.uni-mainz.de/​irods-rest/​rest/​collection/​zdv/​home/​public/​helloCollection?​ticket=mbyAAFGm7vhUdyM+curl https://​irods-web.zdv.uni-mainz.de/​irods-rest/​rest/​collection/​zdv/​home/​public/​helloCollection?​ticket=mbyAAFGm7vhUdyM
 </​code>​ </​code>​
 <code JavaScript>​ <code JavaScript>​
Line 473: Line 440:
  
 the URL for the REST-API consists of: the URL for the REST-API consists of:
- * Server (https://​irods-test.uni-mainz.de/​irods-rest/​rest) +  ​* Server (https://​irods-web.zdv.uni-mainz.de/​irods-rest/​rest) 
- * what (collection|dataObject|fileContents) +  * what (collection|dataObject|fileContents) 
- * optionally '​metadata'​ +  * iRODS path 
- * Ticket string (?​ticket=1234567890)+  ​* optionally '​metadata'​ 
 +  * Ticket string (?​ticket=1234567890)
  
 For further information,​ please read the original [[https://​github.com/​DICE-UNC/​irods-rest|IRODS-REST documentation]] For further information,​ please read the original [[https://​github.com/​DICE-UNC/​irods-rest|IRODS-REST documentation]]
Line 483: Line 451:
 The "​Creator"​ is the responsible person in the sense of the Urheberrechtsgesetz,​ taking care that reusing of third party data is legal and in the sense of the DSGVO, that personal data is handled correctly. Even if the "​Creator"​ is not employed at the university any more. The "​Creator"​ is the responsible person in the sense of the Urheberrechtsgesetz,​ taking care that reusing of third party data is legal and in the sense of the DSGVO, that personal data is handled correctly. Even if the "​Creator"​ is not employed at the university any more.
  
- +There exists ​[[https://​doi.org/​10.5281/​zenodo.3368293|decision guide]] if data can be published, sadly only in german.
-If user leaves the university file ownership goes to the next hierarchical user.+
  
 ==== Licensing ==== ==== Licensing ====
Line 494: Line 461:
     * [[https://​www.gnu.org/​licenses/​|GPL,​ GPLv2, GPLv3]]     * [[https://​www.gnu.org/​licenses/​|GPL,​ GPLv2, GPLv3]]
     * [[https://​opensource.org/​licenses/​MIT|MIT]]     * [[https://​opensource.org/​licenses/​MIT|MIT]]
 +    * [[http://​www.linfo.org/​bsdlicense.html|BSD]]
   * Arts, Images, Text, etc.   * Arts, Images, Text, etc.
     * [[https://​creativecommons.org/​choose/​|Creative Commons (Text, Arts, Photos, ...)]]     * [[https://​creativecommons.org/​choose/​|Creative Commons (Text, Arts, Photos, ...)]]
Line 499: Line 467:
     * [[https://​opendatacommons.org/​licenses/​|Open Data Commons]]     * [[https://​opendatacommons.org/​licenses/​|Open Data Commons]]
  
-The applicability of CC-BY licenses for datasets is [[https://​ckan4rdm.wordpress.com/​2019/​06/​05/​creative-commons-lizenzen-sind-fur-forschungsdaten-ungeeignet/​|doubtful]]. ​Other licenses search at [[https://​licenses.opendefinition.org/​|Open Definition Licenses Service]]+ 
 +The applicability of CC-BY licenses ​for Software is not recommended:​ [[https://​creativecommons.org/​about/​program-areas/​software/​|CC-recommendation]] and [[https://​opensource.stackexchange.com/​questions/​1717/​why-is-cc-by-sa-discouraged-for-code|discussion]]. The same applies ​for datasets, their publication under a CC-license other than CC0 is [[https://​ckan4rdm.wordpress.com/​2019/​06/​05/​creative-commons-lizenzen-sind-fur-forschungsdaten-ungeeignet/​|doubtful]]. ​For other dataset ​licenses search at [[https://​licenses.opendefinition.org/​|Open Definition Licenses Service]]
 + 
  
 Proprietary file formats should be avoided, since you don't know if the software to open them still exists in a few years. Try to stick to open standards. Proprietary file formats should be avoided, since you don't know if the software to open them still exists in a few years. Try to stick to open standards.
data_management/irods.1566216654.txt.gz · Last modified: 2019/08/19 14:10 by jsteinka