start:software:topical:physics:atlas

The instructions here are mostly for the desktop linux PCs that have been installed by Andreas Nussbaumer (or with boot server by Andreas) and that are modified in a way, that the central administration is not possible (or rendered ineffective). If you have your own ssh-server, please be sure to follow the same guidelines.

This is at the moment in a test phase. Please try to use it as much as you can.

The university is trying to reduce the number of open ports to the outside world. One of the open ports is port 22 for SSH. Although SSH is considered to have few vulnerabilities when setup up correctly and kept up to date, the SSH servers are usually targets of brute-force attacks that can either probe weak passwords or decrease availability for users.

The ETAP linux desktops (and some servers) that are setup by Andreas Nussbaumer usually block IP addresses for hours (not users), if the password has been entered wrong 3 times. You can also use ssh-keys (see section about ssh-keys) for added security (ideally protected by a password as well, see remarks on using ssh-keys). The university LDAP user authentication blocks users (not IP addresses) when in a short amount of time many failed log-in attempt have been performed.

Both measures can be supported by allowing only ssh-connections from the university network and allow only outside connections through the sshgate.

On Linux you can create ssh-keys by using ssh-keygen -b 4096

(-b 4096 ensures large keys that are considered safe at the (2020, see Wikipedia on RSA here)

Please note that, if you already have a key, the key files id_rsa and id_rsa.pub will be overwritten. You can also generate more than one key and use different keys for different servers.

following the instructions: ZDV here

Most important is that the comment that you can enter when adding your key to your account must contain the string SSHGATE, HPCGATE or LINUXGATE (multiple purposes can be separated by a comma, e.g. HPCGATE,SSHGATE). This also means that people that have already deployed a key for Mogon do not need to register a new key.

Always protect you ssh-key with a password, in case someone steals your ssh-key files (private key), the attacker needs a password to decrypt it. The password should follow common password guidelines (strong password, not the same as somewhere else etc.).

You ssh-key consists of two files/part, the public key (id_rsa.pub) and the private key (id_rsa). The public key is free to be distributed and you can give it to other (even untrusted) people, websites or computers. The private key must be protected from other people (ssh even does not allow the private key file to have more than access rights to its owner).

On the ETAP computer the ssh-key can be used for login. However, your home directory needs to be mounted first and this can only work with your password. So the first time you log into a PC, you might need to enter your password and only after that your ssh-key is accepted.

This works similar to the login into Mogon through the hpcgate. You do not need to use the sshgate, when you directly connect to the hpcgate. You do not need to use the sshgate when you are inside the university network (physically or through a VPN).

The option -J automatically uses the ssh gate as a jump host (or jump proxy). You can have two different usernames for the login and you can also use different ssh-keys (not shown in the example below), the default will assume the default username ($USER) and ssh-key (id_rsa). ssh -J UNIVERYSITYUSERNAME@sshgate.zdv.uni-mainz.de USERNAME@TARGET  For other operating systems or programs other than ssh, please look for the option jump proxy or jump host. If you want to use scp, rsync or similar, you need to use the appropriate option or the option that can set the ssh-command and you need to use the command and options above. For older ssh versions you can try the options on this webpage WikiBooks ssh. For more convenience use your ssh-config (see below). In short the ssh-config allows you to use an alias when connecting with ssh and it can set all the options for a certain connection. Host sshgate HostName sshgate.zdv.uni-mainz.de User <username> IdentityFile ~/Path/To/Private/Key1 Host myhost HostName myhost.physik.uni-mainz.de User <username> ProxyJump sshgate ForwardX11 yes IdentityFile ~/Path/To/Private/Key2 In this example two aliases sshgate and myhost have been created. In the alias myhost the alias sshgate have been referenced. For both aliases a number of options, like the username and the path to the ssh key have been defined, so that you only need to use ssh myhost to access myhost.physik.uni-mainz.de with you correct username(s) through the sshgate. In the following MOGON is used as a synonym for MOGON II. All machines on Mogon2 are running a version of CentOS8, so that it can support running software like on LXPLUS. For general usage of ATLAS software have a look at common ATLAS Wiki pages like here1). See below for some specific topics on setting up software. The ZDV hosts a wiki about MOGON with some useful hints: https://mogonwiki.zdv.uni-mainz.de/dokuwiki/start Please consider to join the HPC Matttermost group as well: https://mattermost.gitlab.rlp.net/hpc-support Generally the MOGON login and worker nodes have limited access to the internet, the worker nodes have only the necessary network connections, so some behaviour regarding access to computer outside of the MOGON network might be different from login nodes. Login nodes have limited access to the university network which includes • gitlab.rlp.net • all wetap/etap machines In addition some common websites have been enabled by the HPC team (this is subject to changes by the HPC team) • common websites for python pip installations • gitlab.cern.ch for gitlab access via https Since ssh connections to university computers (and from university computer via lpcgate) are not limited you can use ssh to “bridge” connections2)3). Please note that you must understand what you are doing and connections should only be opened when needed. Any misuse can have lead to ban from using MOGON or more strict limitations on ssh for everyone. Your starting directory is your home directory which is different from your university home directory. You should store your code here and other files that you do not change often. There is a so-called project/scratch space available here /gpfs/fs7/atlas/ where you can create your own directory and store your output files. The size is 125TB for all Mainz ATLAS/ETAP users. A large fraction of the storage is realized as a grid storage with the name MAINZ_LOCALGROUPDISK. Writing to this storage should be done via grid tools (see below). You can directly read the files from the storage in the directory /lustre/miifs02/storm/atlas/atlaslocalgroupdisk/rucio/ which then follows the rules of local storage of grid files. How to find your particular file is also explained below. For your jobs on MOGON you should make use of a temporary directory$TMPDIR that is specifically created for the job. It will be on a very fast storage system and it will be deleted at the end of the job. Before the end of the job you can copy your output files to the project/scratch space.

CvmFS4) is installed on the login nodes as well as on the worker nodes.

#### Setup

Put this in your .bashrc:

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' Then, you can enable the ATLAS environment. Due to the CentOS 8 operating system, this currently runs within a Singularity container and can be called with: setupATLAS -c centos7 -b If you want to run a script inside the container, you have to specify the script before running setupAtlas: export ALRB_CONT_RUNPAYLOAD="<YourScript>" In order to mount gpfs and lustre, create a file ~/alrb_container.cfg.sh containing: if [ -d /lustre ]; then set_ALRB_ContainerEnv ALRB_CONT_CMDOPTS "-B /lustre:/lustre" fi # set HOME to be the same as that of regular home set_ALRB_ContainerEnv ALRB_CONT_POSTSETUP "export HOME='$HOME'"

Follow common instructions on setting up ATLAS software via CvmFS, like e.g. lsetup or showVersions

#### General remarks

- You should only store data on MOGON that is related to your work on MOGON. The fileserver is not intended as a backup system. - We want to reserve miifs02 for the grid site. All your personal (MOGON related) data should be stored on /gpfs/fs7 (more details below).

- Data you want to archive and do not need to access on a regular basis can be stored in the MOGON archive using iRODS

#### Request samples

Use rucio to store samples on our grid site (using https://rucio-ui.cern.ch/r2d2/request) instead of downloading them to a local folder. This way users can share datasets. And data not needed anymore will be removed after the lifetime you can define there. For all rucio operations, a grid certificate is needed. Due to the privacy policy on Mogon, you have to create it using a machine with CVMFS connected to the internet (e.g. lxplus). There you call:

setupATLAS
lsetup rucio
voms-proxy-init -voms atlas --valid 48:00 --out gridproxy.x509

Move gridproxy.x509 to Mogon and call:

setupATLAS #add suitable options here
export X509_USER_PROXY=/home/$(whoami)/gridproxy.x509 lsetup rucio Once you stored a DID on the grid site you can find the corresponding files using: rucio list-file-replicas DID | grep MAINZ | sed "s|^.*MAINZ|MAINZ|" | awk '{print$2}' | cut -d "=" -f2  | sed "s|^|/lustre/miifs02/storm|"

You can store your results of your analysis on our grid site using rucio upload instead of copying it to the scratch space by:

rucio upload --rse MAINZ_LOCALGROUPDISK --register-after-upload —lifetime 15552000 —name NAME FILE

Alternatively, you can perform the same for a group of files, e.g.:

rucio upload --rse MAINZ_LOCALGROUPDISK --register-after-upload user.dta:Embedding_DAODs folder/files_in_folder.*.root

“–register-after-upload” registers the file in rucio only after successful upload, especially important when uploading large datasets. Just adjust the username (dta in this case) and the files to create a group. These files can be found via:

rucio list-dataset-replicas user.dta:Embedding_DAODs

Blacklisting is not necessary anymore!!! ATLAS implemented a distance matrix and a multi-hopping schema, that should take care of the issues with Australia-ATLAS and TRIUMF-LCG2.

Your action: You have to blacklist the sites in the table below for all GRID actions (DaTRI, pathena, prun, dq2) !

Background: At the moment there is no connectivity from 2 ATLAS grid sites to the MOGON cluster. This is causing serious problems. DaTRI requests or finished JEDI/PANDA jobs from these sites will not succeed but will stay in the queue of active transfers forever. They have to be removed by hand by grid administrators. We already received several “tickets” to solve this problem but it can not be solved. The University is not connected to a network with a route “from” these two sites (sendig to these sites works, receiving does not work). The following sites are affected:

• Australia-ATLAS
• TRIUMF-LCG2

In detail, these have to be blacklisted:

#### Australia-ATLAS

##### DDM Endpoints
• AUSTRALIA-ATLAS_HOTDISK
• AUSTRALIA-ATLAS_LOCALGROUPDISK
• AUSTRALIA-ATLAS_PHYS-SM
• AUSTRALIA-ATLAS_PRODDISK
• AUSTRALIA-ATLAS_SCRATCHDISK
• AUSTRALIA-ATLAS_SOFT-TEST
• AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK
##### PANDA Australia-ATLAS
• ANALY_AUSTRALIA
• ANALY_AUSTRALIA_GLEXEC
• ANALY_AUSTRALIA_TEST
• Australia-ATLAS
• Australia-ATLAS_MCORE
• Australia-ATLAS_VIRTUAL

#### TRIUMF-LCG2

##### DDM Endpoints
• TRIUMF-LCG2-MWTEST_SCRATCHDISK
• TRIUMF-LCG2_DATATAPE
• TRIUMF-LCG2_GROUPTAPE_PHYS-SUSY
• TRIUMF-LCG2_HOTDISK
• TRIUMF-LCG2_LOCALGROUPDISK
• TRIUMF-LCG2_MCTAPE
• TRIUMF-LCG2_PERF-JETS
• TRIUMF-LCG2_PERF-TAU
• TRIUMF-LCG2_PRODDISK
• TRIUMF-LCG2_SCRATCHDISK
• TRIUMF-LCG2_SOFT-TEST
##### PANDA: TRIUMF
• ANALY_TEST
• ANALY_TRIUMF
• ANALY_TRIUMF_GLEXEC
• ANALY_TRIUMF_HIMEM
• ANALY_TRIUMF_PPS
• TRIUMF
• TRIUMF_HIMEM
• TRIUMF_MCORE
• TRIUMF_PPS
• TRIUMF_VIRTUAL

For the most GRID actions (pathena, prun, dq2) it is sufficient to add these parameters:

pathena –excludedSite=ANALY_TRIUMF,ANALY_AUSTRALIA

prun –excludedSite=ANALY_TRIUMF,ANALY_AUSTRALIA

dq2-get –exclude-site=TRIUMF-LCG2_LOCALGROUPDISK,AUSTRALIA-ATLAS_LOCALGROUPDISK

DaTRI requests (on panda web interface) will inform you (with green text at the bottom of the request summary before you submit) that it will not work. If this occurs, please do not submit the request! It might in the end lead to an exclusion of our Mainz site from the grid! (It is causing big trouble in the system)

If your datasets are only at one of these sites, please request a replica (DaTRI user request in PANDA web interface) to Karlsruhe FZK-LCG2_SCRATCHDISK. When the replica is complete the exclusion should work.

#### Cancellation of data transfers

Firstly, you have to identify the dataset's name. Go to Panda and fill in the “Data Pattern” with the name of the dataset (e.g., user.tlin*). Choose “Request status” as “transfer” and click the button “list” to get all your dataset which are transferring now.

Second, click on the dataset name you would like to stop transferring; this will lead you to a page with details on the transfer. Check the “Status” and change it to “Stop”. The transfer should no be stopped. You can check the status again like detailed in the first step. It should be have the status “stopped”.

Some links to check the status of mainz.

# In case of problems

• start/software/topical/physics/atlas.txt