Many ATLAS-Jobs use the experiment specific software ATHENA, which is currently running exclusively under the OS Scientific Linux 5.x. The native OS installed on all nodes of MOGON is Scientific Linux 6.2. Most nodes (95%) of MOGON are 4-socket 16 core AMD Bulldozer machines that require a kernel newer than the one shipped with SL5.x. A native SL5.x install on such a node would be technically possible but come along with huge penalties on the performance. Executing such jobs inside of virtual machines with the desired OS can be a solution to this issue. Unfortunately the batch-system LSF as of now does not offer a mechanism to assign a job to a virtual machine, that is exclusivly created for that job.
To be able to run ATLAS jobs on MOGON we created a mechanism to send jobs to a virtual machine, that are created on job start. Therefore the new queue atlastest was installed. The name already indicates, that we are still in a testing phase and not all desirable features are yet installed. Also a set of restrictions is still to be taken into account (see below in CURRENT RESTRICTIONS). As typical ATLAS jobs create a high I/O rate, this could turn out to be a problem for either the network or - if file staging is used - the local disk. In the current setup we enforce file staging. In order to keep the impact on the local disk of the Hypervisor low, each VM receives a RAM disk for local I/O. This RAM disk will be exported to the VM via NFS. All user I/O operations inside the VM will happen on that RAM disk and keep I/Os on the local disk of the underlying Hypervisor low.
On VM creation the respective user will be created locally on the VM with his home diretory being
/jobdir. Login to the VM is possible via the
vmssh command, which expects the JOBID as the only argument. e.g.
vmssh <JOBID> . Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd. The latter usually takes about 30 seconds. During this time the stage in files are already copied to the RAM disk to keep the impact of the boot time of the VM low.
The user may set some parameters of the VM his jobs will run in, e.g. determine the size of VM's RAM. The parameters of the queried VM have to be set inside the
-Jd, which is originally used to assign a specified description to a job. Such a job description may contain up to 4094 characters. We use it to describe the characteristics of the VM.
Currently the VMs have restricted outside connectivity via NAT, that is done by the Hypervisor node. You may reach all machines on campus, but only a few in the external internet. This access should not be used to download big files to the node!
Details about the processes involved during job submission to and execution in VMs can be found in Jobsubmission to VMs in LSF - Technical details
The following command line arguments (CLA) may be used within -Jd “…”, CLAs must be seperated by ';' and not contain white spaces:
|CLR||Short Description||Default value||Description|
| ||List of stage in files||None||List of comma separated stage in files with their absolute paths - if not set, no files will be copied to the VM. On job submission a test is done to find out if all your stage in files fit on the ram disk. If not the job will not be submitted.|
| ||Destination directory||User's home|| Destination directory, to which the stage out files should be copied - if not set, the user's
| ||List of stage out files||None|| List of comma separated stage in files with paths relative to
| ||Memory of the VM||1GByte RAM|| Memory size of the VM in kByte (e.g. for 1 GByte write:
| ||Size of the RAM disk||1GByte|| Size of the RAM disk for the VM.
| ||Number of virtual cpus||1 CPU|| Number of cpus for the VM. If not set. the default of 1 cpu will be used. PLEASE NOTE: Currently only single core VMs are allowed (
| ||OS for the VM||vm-sl58|| OS installed on the VM. Currently only SL5.8 is allowed
| ||User's Auth. Keys file||None|| Absolute path to the location to the authorized_keys file of the user - if not set, no user login to the VM via
| ||Verbose output about VM creation||off|| With this option set (-verb=on), the user get a detailed output of the process of VM creation appended to his
You wish to run your job in a virtual machine with the following characeristics:
bsub command line parameter
-Jd will look like this:
-Jd "-mem=2000000;-rd=3000M;-cpus=1;-vmos=vm-sl58;-fin=/project/atlas/in1,/project/atlas/in2;-dd=/home/<USERNAME>/results/; \ -fout=out1,out2;-ak=/home/<USERNAME>/.ssh/authorized_keys;-verb=on"
How job submission on LSF is done can be looked up in the section Submit and manage jobs on Mogon (Quickstart). However here is an example for a complete command for job submission to atlastest:
bsub -q atlastest -W 300 -oo /home/<USERNAME>/stdoutput -eo /home/<USERNAME>/stderroroutput -Jd "-mem=2000000;-rd=3000M; \ -cpus=1;-vmos=vm-sl58;-fin=/project/atlas/in1,/project/atlas/in2;-dd=/home/<USERNAME>/results/;-fout=out1,out2; \ -ak=/home/<USERNAME>/.ssh/authorized_keys;-verb=on" /home/<USERNAME>/jobscript.sh
While your job is running on the VM you may login via the
Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd (about 30s).
Inside your jobscript running on the VM or if you are logged in on the VM, you may download more files to your RAM disk, than initially queried at job submission via
-fin=. This can be useful when processing several big files sequentially and at the same time keep the size of your RAM disk low:
ON_VM> cprq.sh <ABSOLUTE PATH TO FILE ON MOGON> <PATH TO DESTINATION DIRECTORY INSIDE THE VM>
cprq.shdoes not check the available space on the RAM disk.
-Jdthe use of characters is restricted: