Show pageOld revisionsBacklinksBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====Currently job submission to the queue atlastest is restricted and not open for users==== ===MOTIVATION=== Many ATLAS-Jobs use the experiment specific software ATHENA, which is currently running exclusively under the OS Scientific Linux 5.x. The native OS installed on all nodes of MOGON is Scientific Linux 6.2. Most nodes (95%) of MOGON are 4-socket 16 core AMD Bulldozer machines that require a kernel newer than the one shipped with SL5.x. A native SL5.x install on such a node would be technically possible but come along with huge penalties on the performance. Executing such jobs inside of virtual machines with the desired OS can be a solution to this issue. Unfortunately the batch-system LSF as of now does not offer a mechanism to assign a job to a virtual machine, that is exclusivly created for that job. ===TECHINCAL REALIZATION=== To be able to run ATLAS jobs on MOGON we created a mechanism to send jobs to a virtual machine, that are created on job start. Therefore the new queue **atlastest** was installed. The name already indicates, that we are still in a testing phase and not all desirable features are yet installed. Also a set of restrictions is still to be taken into account (see below in CURRENT RESTRICTIONS). As typical ATLAS jobs create a high I/O rate, this could turn out to be a problem for either the network or - if file staging is used - the local disk. In the current setup we enforce file staging. In order to keep the impact on the local disk of the Hypervisor low, each VM receives a RAM disk for local I/O. This RAM disk will be exported to the VM via NFS. All user I/O operations inside the VM will happen on that RAM disk and keep I/Os on the local disk of the underlying Hypervisor low. On VM creation the respective user will be created locally on the VM with his home diretory being ''/jobdir''. Login to the VM is possible via the ''vmssh'' command, which expects the JOBID as the only argument. e.g. ''vmssh <JOBID>'' . Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd. The latter usually takes about 30 seconds. During this time the stage in files are already copied to the RAM disk to keep the impact of the boot time of the VM low. The user may set some parameters of the VM his jobs will run in, e.g. determine the size of VM's RAM. The parameters of the queried VM have to be set inside the ''bsub'' parameter ''-Jd'', which is originally used to assign a specified description to a job. Such a job description may contain up to 4094 characters. We use it to describe the characteristics of the VM. Currently the VMs have restricted outside connectivity via NAT, that is done by the Hypervisor node. You may reach all machines on campus, but only a few in the external internet. This access should not be used to download big files to the node! Details about the processes involved during job submission to and execution in VMs can be found in [[Jobsubmission to VMs in LSF - Technical details]] ===VM SPECIFIC INFORMATION IN JOB SUBMISSION=== The following command line arguments (CLA) may be used within -Jd "...", CLAs must be seperated by ';' and not contain white spaces: ^ CLR ^ Short Description ^ Default value ^ Description ^ | ''-fin='' | List of stage in files | None | List of comma separated stage in files with their absolute paths - if not set, no files will be copied to the VM. On job submission a test is done to find out if all your stage in files fit on the ram disk. If not the job will not be submitted. | | ''-dd='' | Destination directory | User's home | Destination directory, to which the stage out files should be copied - if not set, the user's ''homedir/$JOBID/'' will be used | | ''-fout='' | List of stage out files | None | List of comma separated stage in files with paths relative to ''/jobdir'' on the VM - if not set, no files will be copied after job completion | | ''-mem='' | Memory of the VM | 1GByte RAM | Memory size of the VM in kByte (e.g. for 1 GByte write: ''-mem=1000000'') - if not set, the default of 1GByte will be used. Keep in mind that the kernel etc.on the running VM will consume some of the memory. | | ''-rd='' | Size of the RAM disk | 1GByte | Size of the RAM disk for the VM. ''-rd=1G = 1000M = 1000000k = 1000000000'' are all eqivalent - if not set, the default of 1GByte will be used | | ''-cpus='' | Number of virtual cpus | 1 CPU | Number of cpus for the VM. If not set. the default of 1 cpu will be used. PLEASE NOTE: Currently only single core VMs are allowed (''-cpus=1''), request for more cpus will be ignored | | ''-vmos='' | OS for the VM | vm-sl58 | OS installed on the VM. Currently only SL5.8 is allowed ''-vmos=vm-sl58'' - if not set, the default OS vm-sl58 will be used | | ''-ak='' | User's Auth. Keys file | None | Absolute path to the location to the authorized_keys file of the user - if not set, no user login to the VM via ''vmssh'' will be possible | | ''-verb='' | Verbose output about VM creation | off | With this option set (-verb=on), the user get a detailed output of the process of VM creation appended to his ''stderr'' file of the job - if not set -verb=off is assumed | ==Example== You wish to run your job in a virtual machine with the following characeristics: * Create a VM with 2G of Memory, a 3 GByte RAM disk, 1 cpu and the OS SL5.8. * Copy the stage in files /project/atlas/in1 und /project/atlas/in2 to the RAM disk before the job starts. * Copy the files /jobdir/out1 /jobdir/out2 (both inside the VM) to the directory /home/<USERNAME>/results/ when the job is finished. * Enable user login to the VM and append a full report of the VM creation and destruction to the users stderr when the job is finished. The ''bsub'' command line parameter ''-Jd'' will look like this: <code> -Jd "-mem=2000000;-rd=3000M;-cpus=1;-vmos=vm-sl58;-fin=/project/atlas/in1,/project/atlas/in2;-dd=/home/<USERNAME>/results/; \ -fout=out1,out2;-ak=/home/<USERNAME>/.ssh/authorized_keys;-verb=on" </code> ==Complete example== How job submission on LSF is done can be looked up in the section [[Submit and manage jobs on Mogon (Quickstart)]]. However here is an example for a complete command for job submission to **atlastest**: <code> bsub -q atlastest -W 300 -oo /home/<USERNAME>/stdoutput -eo /home/<USERNAME>/stderroroutput -Jd "-mem=2000000;-rd=3000M; \ -cpus=1;-vmos=vm-sl58;-fin=/project/atlas/in1,/project/atlas/in2;-dd=/home/<USERNAME>/results/;-fout=out1,out2; \ -ak=/home/<USERNAME>/.ssh/authorized_keys;-verb=on" /home/<USERNAME>/jobscript.sh </code> ===WHILE YOUR JOB IS RUNNING=== ==Login to VM== While your job is running on the VM you may login via the ''vmssh'' command: <code> vmssh <JOBID> </code> Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd (about 30s). ==Copy new files to the RAM disk== Inside your jobscript running on the VM or if you are logged in on the VM, you may download more files to your RAM disk, than initially queried at job submission via ''-fin=''. This can be useful when processing several big files sequentially and at the same time keep the size of your RAM disk low: <code> ON_VM> cprq.sh <ABSOLUTE PATH TO FILE ON MOGON> <PATH TO DESTINATION DIRECTORY INSIDE THE VM> </code> Note: * Only one copy request at the same time can be launched. * Keep the free disk space on your RAM disk in mind! ''cprq.sh'' does not check the available space on the RAM disk. ===CURRENT RESTRICTIONS=== * Inside the command line parameter ''-Jd'' the use of characters is restricted: * ';' must not be used except for the separation of the command line arguments * filenames should only contain: A-Z, a-z, 0-9, '_', '-', '.' * No environment variables of your current environment are passed with the job - they have to set in the jobscript * Within the VM nor LSF commands can be used neither LSF environment variables are set ===NEXT STEPS=== * Release the current restrictions * Integrate ATHENA access inside the VMs * Do a full intergation into LSF submit_jobs_to_a_virtual_machine_beta.txt Last modified: 2012/11/28 14:58by jahrens