submit_jobs_to_a_virtual_machine_beta

This is an old revision of the document!

#### MOTIVATION

Many ATLAS-Jobs use the experiment specific software ATHENA, which currently only runs under the OS Scientific Linux 5.x. The native OS installed on all nodes of MOGON is Scientific Linux 6.2. Most nodes (95%) of MOGON are 4-socket 16 core AMD Bulldozer machines that require a kernel newer than the one shipped with SL5.x. A native SL5.x install on such a node would either not work at all or come along with huge penalties on the performance. Using virtual machines with the desired OS can be a solution to this issue. Unfortunately the batch-system LSF as of now does not offer a mechanism to assign a job to a virtual machine.

#### TECHINCAL REALIZATION

To be able to run ATLAS jobs on MOGON we created a mechanism to send jobs to a virtual machine. Therefore the new queue 'atlastest' was installed. The name already indicates, that we are still in a testing phase and not all desirable features are yet installed. Also a set of restrictions is still to be taken into account (see CURRENT RESTRICTIONS). As typical ATLAS jobs create a high I/O rate, this could turn out to be a problem for either the network or - if file staging is used - the local disk. In the current setup we enforce file staging. In order to keep the impact on the local disk of the Hypervisor low, each VM receives a RAM disk for local I/O. This RAM disk will be exported to the VM via NFS. All user I/O operations inside the VM will happen on that RAM disk and keep I/Os on the local disk on the underlying Hypervisor low. On VM creation the respective user will be created locally on the VM with his home diretory being “/jobdir”. Login to the VM is possible via the 'vmssh' command, which expects the JOBID as the only argument. e.g. vmssh <JOBID> . Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd. The latter usually takes about one minute. The user may set some parameters of the VM his jobs will run in, e.g. determine the size of VM's RAM. The parameters of the queried VM have to be set inside an LSF-Parameter called -Jd, which is originally used to assign a specified description to a job. Such a job description may contain up to 4094 characters. We use it to describe the characteristics of the VM.

#### VM SPECIFIC INFORMATION IN JOB SUBMISSION

The following command line arguments (CLA) may be used within -Jd “…”, CLAs must be seperated by ';' and not contain white spaces:

CLR Short Description Default value Description
-fin= List of stage in files None List of comma separated stage in files with their absolute paths - if not set, no files will be copied to the VM
-dd= Destination directory User's home Destination directory, to which the stage out files should be copied - if not set, the user's homedir/\$JOBID/ will be used
-fout= List of stage out files None List of comma separated stage in files with paths relative to /jobdir on the VM - if not set, no files will be copied after job completion
-mem= Memory of the VM 1GByte RAM Memory size of the VM in kByte (e.g. for 1 GByte write: -mem=1000000) - if not set, the default of 1GByte will be used. Keep in mind that the kernel etc.on the running VM will consume some of the memory.
-rd= Size of the RAM disk 1GByte Size of the RAM disk for the VM. -rd=1G = 1000M = 1000000k = 1000000000 are all eqivalent - if not set, the default of 1GByte will be used
-cpus= Number of virtual cpus 1 CPU Number of cpus for the VM. If not set. the default of 1 cpu will be used. PLEASE NOTE: Currently only single core VMs are allowed (-cpus=1), request for more cpus will be ignored
-vmos= OS for the VM vm-sl58 OS installed on the VM. Currently only SL5.8 is allowed -vmos=vm-sl58 - if not set, the default OS vm-sl58 will be used
-ak= User's Auth. Keys file None Absolute path to the location to the authorized_keys file of the user - if not set, no user login to the VM via vmssh will be possible
-verb= Verbose output about VM off With this option set (-verb=on), the user get a detailed output of the process of VM creation appended to his stderr file of the job - if not set -verb=off is assumed
##### Example

You wish to run your job in a virtual machine with the following characeristics:

• Create a VM with 2G of Memory, a 3 GByte RAM disk, 1 cpu and the OS SL5.8.
• Copy the stage in files /project/atlas/in1 und /project/atlas/in2 to the RAM disk before the job starts.
• Copy the files /jobdir/out1 /jobdir/out2 (both inside the VM) to the directory /home/<USERNAME>/results/ when the job is finished.
• Enable user login to the VM and append a full report of the VM creation and destruction to the users stderr when the job is finished.

The bsub command line parameter -Jd will look like this:

-Jd "-mem=2000000;-rd=3000M;-cpus=1;-vmos=vm-sl58;-fin=/project/atlas/in1,/project/atlas/in2;-dd=/home/<USERNAME>/results/; \
-fout=out1,out2;-ak=/home/<USERNAME>/.ssh/authorized_keys;-verb=on"
##### Complete example

How job submission on LSF is done can be looked up in this section. However here is an example for a complete command for job submission to atlstest:

bsub -q atlastest -W 300 -oo /home/<USERNAME>/stdoutput -eo /home/<USERNAME>/stderroroutput -Jd "-mem=2000000;-rd=3000M; \
-ak=/home/<USERNAME>/.ssh/authorized_keys;-verb=on" /home/<USERNAME>/jobscript.sh

#### CURRENT RESTRICTIONS

• Inside the command line parameter -Jd the use of characters is restricted:
• ';' must not be used except for the separation of the command line arguments
• filenames should only contain: A-Z, a-z, 0-9, '_', '-', '.'
• No environment variables of your current environment are passed with the job - they have to set in the jobscript
• Within the VM nor LSF commands can be used neither LSF environment variables are set

#### NEXT STEPS

• Release the current restrictions
• Integrate ATHENA access inside the VMs
• Do a full intergation into LSF
• submit_jobs_to_a_virtual_machine_beta.1354027204.txt.gz