software:python

This is an old revision of the document!


Python

Currently, the following versions of Python are installed and usable using the specified modulefile:

Version Additional Packages Modulefile
2.6.6 None None (System default at /usr/bin/python)
2.7.7 pip, virtualenv, virtualenvwrapper, bsub, The SciPy Stack1), Cython, scikit-learn Python/2.7.7
3.3.5 pip, virtualenv, virtualenvwrapper, The SciPy Stack2), Cython, scikit-learn Python/3.3.5
3.4.1 pip, virtualenv, virtualenvwrapper, The SciPy Stack3), Cython, scikit-learn Python/3.4.1

We recommend to avoid Python 2.6.6 since we can provide better support for the versions that we have installed manually.

If you need additional Python packages, you can easily install them yourself either "globally" in your home directory or inside of a virtual environment.

In general, having a personal Python environment where you can install third-party packages (without needing root priviliges) yourself is very easy. The preparation steps needed on Mogon are described below.

While the first variant is already sufficient, using virtualenvs, we recommend using virtualenvs since they are a lot easier to work with. Virtualenvs can also be shared between users if created in your groups project directory.

First, create some directories in which installed packages will be placed:

$ mkdir -p ~/.local/bin
$ mkdir -p ~/.local/lib/python<VERSION>/site-packages

Then add the created bin directory to your PATH in your .bashrc file and source it:

$ echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
$ source ~/.bashrc

Now create a configuration file for easy_install and pip, the Python package management tools:

$ echo -e '[easy_install]\nprefix = ~/.local' > ~/.pydistutils.cfg
$ mkdir -p ~/.pip
$ echo -e '[install]\nuser = true' > ~/.pip/pip.conf

If you now use easy_install or pip, it will automatically install packages to the correct paths in your home directory.

A so called virtualenv can be seen as an isolated, self-contained Python environment of third-party packages.
Different virtualenvs do not interfere with each other nor with the system-wide installed packages.

It is advised to make use of virtualenv in Python, especially if you intend to install different combinations or versions of various Python packages. Virtualenvs can also be shared between users if created in your groups project directory.

If you are using Python 2.6.6, you need to install virtualenv:

$ easy_install virtualenv
Searching for virtualenv
Reading http://pypi.python.org/simple/virtualenv/
Best match: virtualenv 1.10.1
[...]
Processing dependencies for virtualenv
Finished processing dependencies for virtualenv

We need to remove the easy_install configuration file created above, since the path set there would interfere with virtualenv:

$ rm ~/.pydistutils.cfg
$ rm ~/.pip/pip.conf

Now you can simply create, activate, use, deactivate and destroy as many virtualenvs as you want:

Create

Creating a virtualenv will simply set up a directory structure and install some baseline packages:

$ virtualenv ENV
New python executable in ENV/bin/python
Installing Setuptools...done.
Installing Pip...done.

With virtualenvs, you can even make each virtualenv use its own version of the Python interpreter:

$ virtualenv --python=/usr/bin/python2.6 ENV2.6
$ virtualenv --python=/cluster/Apps/Python/<VERSION>/bin/python ENV2.7

If you want to use the third-party packages numpy, scipy, matplotlib, … which are already installed globally, you need to add the parameter –system-site-packages to your virtualenv command.

Activate

To work in a virtualenv, you first have to activate it, which sets some environment variables for you:

$ source ENV/bin/activate
(ENV)$ # Note the name of the virtualenv in front of your prompt - nice, heh?

Use

Now you can use your virtualenv - newly installed packages will just be installed inside the virtualenv and just be visible to the python interpreter you start from within the virtualenv:

(ENV)$ easy_install requests
Searching for requests
Reading https://pypi.python.org/simple/requests/
Best match: requests 1.2.3
[...]
Processing dependencies for requests
Finished processing dependencies for requests

or

(ENV)$ pip install requests
Downloading/unpacking requests
  Downloading requests-1.2.3.tar.gz (348kB): 348kB downloaded
  Running setup.py egg_info for package requests
Installing collected packages: requests
  Running setup.py install for requests
Successfully installed requests
Cleaning up...

And now compare what happens with the python interpreter from inside the virtualenv and with the system python interpreter:

(ENV)$ python -c 'import requests'
(ENV)$ /usr/bin/python -c 'import requests'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named requests

Deactivate

Deactivating a virtualenv reverts the activation step and all its changes to your environment:

(ENV)$ deactivate
$

Destroy

To destroy a virtualenv, simply delete its directory:

$ rm ENV

Using multiple virtualenvs can be made much more user friendly using virtualenvwrapper.

If you are using Python 2.6.5, you can install and configure it using

$ easy_install --prefix=$HOME/.local virtualenvwrapper
$ echo 'source $HOME/.local/bin/virtualenvwrapper.sh' >> ~/.bashrc

If you are using any other version of Python, virtualenvwrapper is already installed and you just need to

$ echo 'source /cluster/Apps/Python/<VERSION>/bin/virtualenvwrapper.sh' >> ~/.bashrc

Re-login to apply the changes.

Load Environment Modules (module load [mod])

To load environment modules in python:

execfile('/usr/share/Modules/init/python.py')
module('load','gcc/4.8.2')
module('load','software/bioinf/samtools/0.1.19')

From Python 3.4.1 onwards we enabled on mogon a modules module ;-), e.g.

import modules
modules.module('load', 'Java/jdk1.8.0_25')
import os
os.environ['JAVA_HOME'] # will be '/cluster/Apps/Java/jdk1.8.0_25'

This, of course, requires an environment, where the –system-site-packages-option has been employed during the set up of your (currently active) python environment.

Job submission

For python you can use the maybe basic but friendly bsub package from: https://github.com/brentp/bsub

from bsub import bsub
 
BAM2FQ = "bam2fq --input %s .."
STAR = "star --align .."
SAM2BAM = "samtools view .."
for dataset in datasets:
  bam2fq = bsub("bam2fastq", R='span[hosts=1] affinity[core(1)]', app='Reserve1G',  n=1, q='long', W='2:00' )
  bam2fq = bam2fq( BAM2FQ % dataset )
  star = bam2fq.then(   STAR,    job_name="STAR_%s" % dataset,    R='span[hosts=1] affinity[core(6)]', app='Reserve30G', n=1, q='long', W='8:00' )
  sam2bam = star.then(  SAM2BAM, job_name="SAM2BAM_%s" % dataset, R='span[hosts=1] affinity[core(1)]', app='Reserve10G', n=1, q='long', W='3:00' )
  print "First job_id:" + bam2fq.job_id
  print "Last job_id:" + sam2bam.job_id
  last = sam2bam.job_id
 
print "still running? %s" % ( "yes" if bsub.poll(last) else "no" )

Things to consider

Python is an interpreted language. As such it should not be used for lengthy runs in an HPC environment. Please use the availability to compile your own modules with Cython; consult the relevant Cython documentation. If you do not know how to start, attend a local Python course or schedule a meeting at our local HPC workshop.

Special packages

Please note that we have already installed numpy, scipy and matplotlib in the versions of Python that we provide additionally.

http://www.numpy.org/

When installing NumPY, the first installation attempt fails at exit. Don't worry, the installation is already finished then, but to be sure, you can simply run the command again to see it exiting cleanly.

Note that NumPY can also be linked against the Intel Math Kernel Library or the AMD Core Math Library:


1) , 2) , 3)
NumPy, SciPy, matplotlib, IPython, pandas, SymPy
  • software/python.1447841912.txt.gz
  • Last modified: 2015/11/18 11:18
  • by meesters