submit_jobs_to_a_virtual_machine_beta

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
submit_jobs_to_a_virtual_machine_beta [2012/11/27 15:40]
jahrens created
submit_jobs_to_a_virtual_machine_beta [2012/11/28 14:58] (current)
jahrens
Line 1: Line 1:
 +====Currently job submission to the queue atlastest is restricted and not open for users====
 +
 ===MOTIVATION=== ===MOTIVATION===
-Many ATLAS-Jobs use the experiment specific software ATHENA, which currently only runs under the OS Scientific Linux 5.x. The native OS installed on all nodes of MOGON is Scientific Linux 6.2. Most nodes (95%) of MOGON are 4-socket 16 core AMD Bulldozer machines that require a kernel newer than the one shipped with SL5.x. A native SL5.x install on such a node would either not work at all or come along with huge penalties on the performance. +Many ATLAS-Jobs use the experiment specific software ATHENA, which is currently running exclusively under the OS Scientific Linux 5.x. The native OS installed on all nodes of MOGON is Scientific Linux 6.2. Most nodes (95%) of MOGON are 4-socket 16 core AMD Bulldozer machines that require a kernel newer than the one shipped with SL5.x. A native SL5.x install on such a node would be technically possible but come along with huge penalties on the performance. 
-Using virtual machines with the desired OS can be a solution to this issue. Unfortunately the batch-system LSF as of now does not offer a mechanism to assign a job to a virtual machine. +Executing such jobs inside of virtual machines with the desired OS can be a solution to this issue. Unfortunately the batch-system LSF as of now does not offer a mechanism to assign a job to a virtual machine, that is exclusivly created for that job.
  
 ===TECHINCAL REALIZATION=== ===TECHINCAL REALIZATION===
-To be able to run ATLAS jobs on MOGON we created a mechanism to send jobs to a virtual machine. +To be able to run ATLAS jobs on MOGON we created a mechanism to send jobs to a virtual machine, that are created on job start
-Therefore the new queue 'atlastestwas installed. The name already indicates, that we are still in a testing phase and not all desirable features are yet installed. Also a set of restrictions is still to be taken into account (see CURRENT RESTRICTIONS). As typical ATLAS jobs create a high I/O rate, this could turn out to be a problem for either the network or - if file staging is used - the local disk. In the current setup we enforce file staging. In order to keep the impact on the local disk of the Hypervisor low, each VM receives a RAM disk for local I/O. This RAM disk will be exported to the VM via NFS. All user I/O operations inside the VM will happen on that RAM disk and keep I/Os on the local disk on the underlying Hypervisor low. +Therefore the new queue **atlastest** was installed. The name already indicates, that we are still in a testing phase and not all desirable features are yet installed. Also a set of restrictions is still to be taken into account (see below in CURRENT RESTRICTIONS). As typical ATLAS jobs create a high I/O rate, this could turn out to be a problem for either the network or - if file staging is used - the local disk. In the current setup we enforce file staging. In order to keep the impact on the local disk of the Hypervisor low, each VM receives a RAM disk for local I/O. This RAM disk will be exported to the VM via NFS. All user I/O operations inside the VM will happen on that RAM disk and keep I/Os on the local disk of the underlying Hypervisor low. 
-On VM creation the respective user will be created locally on the VM with his home diretory being "/jobdir". Login to the VM is possible via the 'vmssh' command, which expects the JOBID as the only argument. e.g. vmssh <JOBID> . Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd. The latter usually takes about one minute. The user may set some parameters of the VM his jobs will run in, e.g. determine the size of VM's RAM. The parameters of the queried VM have to be set inside an LSF-Parameter called -Jd, which is originally used to assign a specified description to a job. Such a job description may contain up to 4094 characters. We use it to describe the characteristics of the VM.+ 
 +On VM creation the respective user will be created locally on the VM with his home diretory being ''/jobdir''. Login to the VM is possible via the ''vmssh'' command, which expects the JOBID as the only argument. e.g. ''vmssh <JOBID>'' . Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd. The latter usually takes about 30 secondsDuring this time the stage in files are already copied to the RAM disk to keep the impact of the boot time of the VM low. 
 + 
 +The user may set some parameters of the VM his jobs will run in, e.g. determine the size of VM's RAM. The parameters of the queried VM have to be set inside the ''bsub'' parameter ''-Jd'', which is originally used to assign a specified description to a job. Such a job description may contain up to 4094 characters. We use it to describe the characteristics of the VM. 
 + 
 +Currently the VMs have restricted outside connectivity via NAT, that is done by the Hypervisor node. You may reach all machines on campus, but only a few in the external internet. This access should not be used to download big files to the node! 
 + 
 +Details about the processes involved during job submission to and execution in VMs can be found in [[Jobsubmission to VMs in LSF - Technical details]] 
  
 ===VM SPECIFIC INFORMATION IN JOB SUBMISSION=== ===VM SPECIFIC INFORMATION IN JOB SUBMISSION===
Line 13: Line 23:
  
 ^ CLR ^ Short Description ^ Default value ^ Description ^  ^ CLR ^ Short Description ^ Default value ^ Description ^ 
-| ''-fin='' | List of stage in files | None | List of comma separated stage in files with their absolute paths - if not set, no files will be copied to the VM |+| ''-fin='' | List of stage in files | None | List of comma separated stage in files with their absolute paths - if not set, no files will be copied to the VM. On job submission a test is done to find out if all your stage in files fit on the ram disk. If not the job will not be submitted. |
 | ''-dd='' | Destination directory | User's home | Destination directory, to which the stage out files should be copied - if not set, the user's ''homedir/$JOBID/'' will be used | | ''-dd='' | Destination directory | User's home | Destination directory, to which the stage out files should be copied - if not set, the user's ''homedir/$JOBID/'' will be used |
 | ''-fout='' | List of stage out files | None | List of comma separated stage in files with paths relative to ''/jobdir'' on the VM -  if not set, no files will be copied after job completion | | ''-fout='' | List of stage out files | None | List of comma separated stage in files with paths relative to ''/jobdir'' on the VM -  if not set, no files will be copied after job completion |
Line 19: Line 29:
 | ''-rd='' | Size of the RAM disk | 1GByte | Size of the RAM disk for the VM. ''-rd=1G = 1000M = 1000000k = 1000000000'' are all eqivalent - if not set, the default of 1GByte will be used | | ''-rd='' | Size of the RAM disk | 1GByte | Size of the RAM disk for the VM. ''-rd=1G = 1000M = 1000000k = 1000000000'' are all eqivalent - if not set, the default of 1GByte will be used |
 | ''-cpus='' | Number of virtual cpus | 1 CPU | Number of cpus for the VM. If not set. the default of 1 cpu will be used. PLEASE NOTE: Currently only single core VMs are allowed (''-cpus=1''), request for more cpus will be ignored | | ''-cpus='' | Number of virtual cpus | 1 CPU | Number of cpus for the VM. If not set. the default of 1 cpu will be used. PLEASE NOTE: Currently only single core VMs are allowed (''-cpus=1''), request for more cpus will be ignored |
-| ''-vmos='' | OS for the VM | vm-sl58 |  OS installed on the VM. Currently only SL5.8 is allowed ''-vmos=vm-sl58'' - if not set, the default OS vm-sl58 will be used |+| ''-vmos='' | OS for the VM | vm-sl58 | OS installed on the VM. Currently only SL5.8 is allowed ''-vmos=vm-sl58'' - if not set, the default OS vm-sl58 will be used |
 | ''-ak='' | User's Auth. Keys file | None | Absolute path to the location to the authorized_keys file of the user - if not set, no user login to the VM via ''vmssh'' will be possible | | ''-ak='' | User's Auth. Keys file | None | Absolute path to the location to the authorized_keys file of the user - if not set, no user login to the VM via ''vmssh'' will be possible |
-| ''-verb='' | Verbose output about VM | off | With this option set (-verb=on), the user get a detailed output of the process of VM creation appended to his ''stderr'' file of the job - if not set -verb=off is assumed |+| ''-verb='' | Verbose output about VM creation | off | With this option set (-verb=on), the user get a detailed output of the process of VM creation appended to his ''stderr'' file of the job - if not set -verb=off is assumed |
  
 ==Example== ==Example==
 +
 You wish to run your job in a virtual machine with the following characeristics: You wish to run your job in a virtual machine with the following characeristics:
  
Line 41: Line 52:
  
 ==Complete example== ==Complete example==
-How job submission on LSF is done can be looked up in this section. However here is an example for a complete command for  +How job submission on LSF is done can be looked up in the section [[Submit and manage jobs on Mogon (Quickstart)]]. However here is an example for a complete command for  
-job submission to **atlstest**:+job submission to **atlastest**:
  
 <code> <code>
Line 50: Line 61:
 </code> </code>
  
 +===WHILE YOUR JOB IS RUNNING===
 +
 +==Login to VM==
 +While your job is running on the VM you may login via the ''vmssh'' command:
 +<code>
 +vmssh <JOBID>
 +</code>
 +Please keep in mind that the login is only possible after the job state is 'RUN' and after the time the VM needs to boot and start its sshd (about 30s).
 +
 +==Copy new files to the RAM disk==
 +Inside your jobscript running on the VM or if you are logged in on the VM, you may download more files to your RAM disk, than initially queried at job submission via ''-fin=''. This can be useful when processing several big files sequentially and at the same time keep the size of your RAM disk low:
 +
 +<code>
 +ON_VM> cprq.sh <ABSOLUTE PATH TO FILE ON MOGON> <PATH TO DESTINATION DIRECTORY INSIDE THE VM>
 +</code>
 +
 +Note:
 +  * Only one copy request at the same time can be launched.
 +  * Keep the free disk space on your RAM disk in mind! ''cprq.sh'' does not check the available space on the RAM disk. 
 +  
 ===CURRENT RESTRICTIONS=== ===CURRENT RESTRICTIONS===
   * Inside the command line parameter ''-Jd'' the use of characters is restricted:    * Inside the command line parameter ''-Jd'' the use of characters is restricted: 
  • submit_jobs_to_a_virtual_machine_beta.1354027204.txt.gz
  • Last modified: 2012/11/27 15:40
  • by jahrens