software:python

# Differences

This shows you the differences between two versions of the page.

 software:python [2015/11/18 11:18]meesters [Load Environment Modules (module load [mod])] software:python [2019/10/12 07:43]meesters removed 2018/05/14 21:49 meesters [Compile Code!!!] 2018/05/14 15:20 meesters [Working with Scalars in Numerics Code] 2018/05/14 15:17 meesters [Too many print statements] 2018/05/04 11:22 meesters 2018/05/04 11:00 meesters [Additional packages] 2018/05/01 21:11 meesters [Additional packages] 2018/05/01 21:11 meesters [Additional packages] 2018/05/01 21:09 meesters [Additional packages] 2018/05/01 21:08 meesters [Available versions] 2018/05/01 20:58 meesters [Available versions] 2018/05/01 20:57 meesters [Python] 2017/09/21 21:26 meesters [Job submission] 2017/09/18 11:37 meesters [Python] 2017/09/18 11:37 meesters [Python] 2017/09/11 12:15 meesters [Available versions] 2017/09/11 12:06 meesters [Available versions] 2017/09/11 12:05 meesters [Load Environment Modules (module load [mod])] 2017/09/11 12:02 meesters [Job submission] 2017/05/18 20:57 henkela [Available versions] 2016/05/31 16:36 meesters [Too many print statements] 2016/05/12 14:26 meesters [Compile Code!!!] 2016/03/31 12:26 meesters [Selectively Eliminate Attribute Access] 2016/03/17 16:57 schlarbm [Regular Expressions] 2016/03/04 22:08 meesters [Regular Expressions] 2016/02/25 10:41 meesters [Job submission] 2015/11/18 13:29 meesters [Available versions] 2015/11/18 11:21 meesters [Using virtualenv] 2015/11/18 11:18 meesters [Load Environment Modules (module load [mod])] 2015/11/18 11:18 meesters [Load Environment Modules (module load [mod])] 2015/06/02 15:16 meesters [Special packages] 2015/02/26 16:54 schlarbm 2015/02/03 16:35 meesters [Load Environment Modules (module load [mod])] 2015/02/03 16:34 meesters [Load Environment Modules (module load [mod])] 2014/06/25 14:06 schlarbm 2014/05/22 14:28 schlarbm 2014/03/25 11:11 scholtal added bsub for python section2014/03/24 14:43 scholtal added section about "module load" in python2013/09/18 10:53 schlarbm [Python 2.7] 2013/09/18 10:52 schlarbm [Python 2.7] 2013/09/18 10:51 schlarbm 2013/08/27 12:58 schlarbm [NumPY] 2013/08/27 12:52 schlarbm 2013/08/27 12:48 schlarbm [NumPY] 2013/08/27 12:45 schlarbm [numpy] 2013/08/27 12:45 schlarbm 2013/08/27 12:35 schlarbm [Preparing your personal Python environment] 2013/08/27 12:34 schlarbm [virtualenv] 2013/08/27 12:34 schlarbm 2013/08/27 12:34 schlarbm 2013/08/27 12:16 schlarbm Add virtualenv2013/08/27 11:51 schlarbm created 2019/10/12 07:43 meesters removed2019/08/19 13:16 jrutte02 [Content of those modulefiles] 2019/08/12 12:17 meesters [Content of those modulefiles] 2019/08/12 12:06 meesters -- deleted: section was outdated and wrong2019/08/12 12:06 meesters [NumPY] -- deleted: section was outdated and wrong2019/04/26 09:51 meesters [Working with Scalars in Numerics Code] 2019/04/26 09:35 meesters 2019/03/21 15:34 jrutte02 2019/02/13 08:19 meesters [Performance Hints] 2018/11/29 11:48 henkela [virtualenvwrapper] 2018/11/29 11:03 meesters [Using virtualenv] 2018/11/29 11:00 meesters [Using virtualenv] 2018/11/29 10:58 meesters [Using virtualenv] 2018/11/29 10:39 meesters [Using virtualenv] 2018/11/29 10:34 meesters [Your Personal Environment (Additional Packages)] 2018/11/29 10:33 meesters [Home directory] 2018/11/28 15:52 meesters 2018/11/26 09:43 meesters [Your Personal Environment (Additional Packages)] 2018/11/26 09:31 meesters [Your Personal Environment (Additional Packages)] 2018/06/18 20:43 meesters [Selectively Eliminate Attribute Access] 2018/05/14 21:49 meesters [Compile Code!!!] 2018/05/14 15:20 meesters [Working with Scalars in Numerics Code] 2018/05/14 15:17 meesters [Too many print statements] 2018/05/04 11:22 meesters 2018/05/04 11:00 meesters [Additional packages] 2018/05/01 21:11 meesters [Additional packages] 2018/05/01 21:11 meesters [Additional packages] 2018/05/01 21:09 meesters [Additional packages] 2018/05/01 21:08 meesters [Available versions] 2018/05/01 20:58 meesters [Available versions] 2018/05/01 20:57 meesters [Python] 2017/09/21 21:26 meesters [Job submission] 2017/09/18 11:37 meesters [Python] 2017/09/18 11:37 meesters [Python] 2017/09/11 12:15 meesters [Available versions] 2017/09/11 12:06 meesters [Available versions] 2017/09/11 12:05 meesters [Load Environment Modules (module load [mod])] 2017/09/11 12:02 meesters [Job submission] 2017/05/18 20:57 henkela [Available versions] 2016/05/31 16:36 meesters [Too many print statements] 2016/05/12 14:26 meesters [Compile Code!!!] 2016/03/31 12:26 meesters [Selectively Eliminate Attribute Access] 2016/03/17 16:57 schlarbm [Regular Expressions] 2016/03/04 22:08 meesters [Regular Expressions] 2016/02/25 10:41 meesters [Job submission] 2015/11/18 13:29 meesters [Available versions] 2015/11/18 11:21 meesters [Using virtualenv] 2015/11/18 11:18 meesters [Load Environment Modules (module load [mod])] 2015/11/18 11:18 meesters [Load Environment Modules (module load [mod])] 2015/06/02 15:16 meesters [Special packages] 2015/02/26 16:54 schlarbm Line 3: Line 3: ===== Available versions ===== ===== Available versions ===== - Currently, the following versions of Python are installed and usable using the specified modulefile: + Currently, we have a variety of Python-Versions available as [[:setting_up_environment_modules|module files]]. To list them all run - ^ Version   ^ Additional Packages ^ Modulefile ^ - | //2.6.6// | //None// | //None (System default at ''/usr/bin/python'')// | - | 2.7.7     | pip, virtualenv, virtualenvwrapper, bsub, [[http://www.scipy.org/stackspec.html|The SciPy Stack]]((NumPy, SciPy, matplotlib, IPython, pandas, SymPy)), Cython, scikit-learn | ''Python/2.7.7'' | - | 3.3.5     | pip, virtualenv, virtualenvwrapper, [[http://www.scipy.org/stackspec.html|The SciPy Stack]]((NumPy, SciPy, matplotlib, IPython, pandas, SymPy)), Cython, scikit-learn | ''Python/3.3.5'' | - | 3.4.1     | pip, virtualenv, virtualenvwrapper, [[http://www.scipy.org/stackspec.html|The SciPy Stack]]((NumPy, SciPy, matplotlib, IPython, pandas, SymPy)), Cython, scikit-learn | ''Python/3.4.1'' | - We recommend to **avoid Python 2.6.6** since we can provide better support for the versions that we have installed manually. + + $module avail|& grep 'lang/Python' + - If you need additional Python packages, you can easily install them yourself either [[#home_directory|"globally" in your home directory]] or [[#using_virtualenv|inside of a virtual environment]]. + ==== Content of those modulefiles ==== - ===== Additional packages ===== + === Python2 < 2.7.16 and Python3 < 3.7.4 === - In general, having a personal Python environment where you can install third-party packages (without needing root priviliges) yourself is very easy. The preparation steps needed on Mogon are described below. + The Python-Versions available as module files, do provide ''numpy'', ''scipy'', ''pandas'', ''cython'' and more. However, especially a ''matplotlib'' module is most likely missing. This is because our installation framework installs it separately. Hence, the ''matplotlib'' functionality has to be loaded as an additional functionality as a [[:setting_up_environment_modules|module file]]. - While the first variant is already sufficient, using virtualenvs, we recommend using [[#using_virtualenvs|virtualenvs]] since they are a lot easier to work with. + The ''intel'' versions are link against [[https://software.intel.com/en-us/intel-mkl|Intel's MKL]]. Exporting ''OMP_NUM_THREADS'' enables multithreaded matrix handling with ''numpy''. - Virtualenvs can also be shared between users if created in your groups project directory. + - ==== Home directory ==== - First, create some directories in which installed packages will be placed: + === Python2 >= 2.7.16 and Python3 >= 3.7.4 === + + Our installation framework altered its policies to avoid the cluttering of modulefiles. Hence, when loading a Python Module: -$ mkdir -p ~/.local/bin + $module load lang/Python/ -$ mkdir -p ~/.local/lib/python/site-packages + - Then add the created ''bin'' directory to your ''PATH'' in your ''.bashrc'' file and source it: + only the bare Python with a few additional libraries (or "modules" in Python-speak) is available. To use the scientific modules load: - $echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc +$ module load lang/SciPy-bundle/ - $source ~/.bashrc + - Now create a configuration file for ''easy_install'' and ''pip'', the Python package management tools: + The toolchain version of the ''bundle-version'' has to fit the compiler of the ''python-version''. The same is true for the matplotlib modules, which can be loaded as: +$ module load vis/matplotlib/ + + + In this case the Python versions of the Python-module and the matplotlib module have to match as well as the toolchain version to the python version. + + Here is a list of matching versions: + + ^ Python Compiler Version ^ SciPy-bundle or Matplotlib toolchain ^ + | ''GCCcore-8.3.0''       | ''foss-2019a'' | + + ==== Which version should be picked? ==== + + If you intend to use Python in combination with another module, ensure that the [[:setting_up_environment_modules#toolchains|toolchain]] and the toolchain version of the additional module fit with your selected Python module. With regard to the Python version, try to stay as current as possible. + + If you need additional Python packages, you can easily install them yourself either [[#home_directory|"globally" in your home directory]] or [[#using_virtualenv|inside of a virtual environment]]. + + ====== Your Personal Environment (Additional Packages) ====== + + In general, having a personal Python environment where you can install third-party packages (without needing root priviliges) yourself is very easy. The preparation steps needed on Mogon are described below. + + {{:software:python_environment.png?direct&400 |https://xkcd.com/1987/}} While the first variant is already sufficient, we recommend using [[#using_virtualenvs|virtualenvs]] since they are a lot easier to work with. + Virtualenvs can also be shared between users if created in your groups project directory, but most importantly virtual environments bear the potential to avoid the [[https://xkcd.com/1987/|setup hell]] you might otherwise experience. + + + Do not use any of the modules ending on ''-bare'' as they are installed as special dependencies for particular modules (or actually installed by accident) to construct your virtual environment. + + + + We strongly discourage using any ''*conda'' setup on one of our clusters: It has often been a source of messing up an existing environment only to be discovered at a source of interference when switching back our modules. There actually are ''*conda'' modules provided by us. If you try and use any ''*conda'' related material, double check the altered environment to be sure what you are doing / what ''*conda'' did. + + ==== Personal Setup ==== + + - First load an appropriate Python module, see the implications above. + - Then navigate to your home directory (if in doubt, type ''cd''). + - Create some directories in which installed packages will be placed: + $mkdir -p ~/.local/bin +$ mkdir -p ~/.local/lib/python/site-packages + + - Now add the created ''bin'' directory to your ''PATH'' in your ''.bashrc'' file and source it: + $echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc +$ source ~/.bashrc + + - Next, create a configuration file for ''easy_install'' and ''pip'', the Python package management tools: $echo -e '[easy_install]\nprefix = ~/.local' > ~/.pydistutils.cfg$ echo -e '[easy_install]\nprefix = ~/.local' > ~/.pydistutils.cfg $mkdir -p ~/.pip$ mkdir -p ~/.pip Line 55: Line 93: It is advised to make use of [[http://www.virtualenv.org/en/latest/|virtualenv]] in Python, especially if you intend to install different combinations or versions of various Python packages. Virtualenvs can also be shared between users if created in your groups project directory. It is advised to make use of [[http://www.virtualenv.org/en/latest/|virtualenv]] in Python, especially if you intend to install different combinations or versions of various Python packages. Virtualenvs can also be shared between users if created in your groups project directory. - If you are using Python 2.6.6, you need to install ''virtualenv'': + - + In the following section we will be using '''' as a place holder for the environment name you intend to use. Feel free to choose a name to your liking. We recommend naming the environment after its purpose and/or the python-version you intend to use. - + - $easy_install virtualenv + - Searching for virtualenv + - Reading http://pypi.python.org/simple/virtualenv/ + - Best match: virtualenv 1.10.1 + - [...] + - Processing dependencies for virtualenv + - Finished processing dependencies for virtualenv + - + We need to remove the easy_install configuration file created above, since the path set there would interfere with virtualenv: We need to remove the easy_install configuration file created above, since the path set there would interfere with virtualenv: Line 78: Line 108: Creating a virtualenv will simply set up a directory structure and install some baseline packages: Creating a virtualenv will simply set up a directory structure and install some baseline packages: -$ virtualenv ENV + $virtualenv - New python executable in ENV/bin/python + New python executable in /bin/python Installing Setuptools...done. Installing Setuptools...done. Installing Pip...done. Installing Pip...done. Line 86: Line 116: With virtualenvs, you can even make each virtualenv use its own version of the Python interpreter: With virtualenvs, you can even make each virtualenv use its own version of the Python interpreter: -$ virtualenv --python=/usr/bin/python2.6 ENV2.6 + # after loading an appropriate module file - $virtualenv --python=/cluster/Apps/Python//bin/python ENV2.7 +$ virtualenv --python=$(which python) --system-site-packages > - If you want to use the third-party packages numpy, scipy, matplotlib, ... which are already installed globally, you need to add the parameter ''--system-site-packages'' to your virtualenv command. + If you want to install the pre-installed third-party packages (numpy, scipy, matplotlib, etc.) yourself, just omit the ''--system-site-packages'' parameter when calling virtualenv. + + Otherwise, append the ''LD_LIBRARY_PATH'' of the module you are using onto the environment activation script: + + # note the double quotes + echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> /bin/activate + + + + This last statement (the ''export'') is meant to be executed once, after setting up the virtual environment in the very shell with which it has been created - **not** in jobscripts. + === Activate === === Activate === To work in a virtualenv, you first have to activate it, which sets some environment variables for you: To work in a virtualenv, you first have to activate it, which sets some environment variables for you: - $source ENV/bin/activate +$ source /bin/activate - (ENV)$# Note the name of the virtualenv in front of your prompt - nice, heh? + ()$ # Note the name of the virtualenv in front of your prompt - nice, heh? Line 102: Line 143: Now you can use your virtualenv - newly installed packages will just be installed inside the virtualenv and just be visible to the python interpreter you start from within the virtualenv: Now you can use your virtualenv - newly installed packages will just be installed inside the virtualenv and just be visible to the python interpreter you start from within the virtualenv: - (ENV)$easy_install requests + ()$ easy_install requests Searching for requests Searching for requests Reading https://pypi.python.org/simple/requests/ Reading https://pypi.python.org/simple/requests/ Line 112: Line 153: or or - (ENV)$pip install requests + ()$ pip install requests Downloading/unpacking requests Downloading/unpacking requests Downloading requests-1.2.3.tar.gz (348kB): 348kB downloaded Downloading requests-1.2.3.tar.gz (348kB): 348kB downloaded Line 124: Line 165: And now compare what happens with the python interpreter from inside the virtualenv and with the system python interpreter: And now compare what happens with the python interpreter from inside the virtualenv and with the system python interpreter: - (ENV)$python -c 'import requests' + ()$ python -c 'import requests' - (ENV)$/usr/bin/python -c 'import requests' + (>ENV>)$ /usr/bin/python -c 'import requests' Traceback (most recent call last): Traceback (most recent call last): File "", line 1, in File "", line 1, in Line 134: Line 175: Deactivating a virtualenv reverts the activation step and all its changes to your environment: Deactivating a virtualenv reverts the activation step and all its changes to your environment: - (ENV)$deactivate + ()$ deactivate  Line 141: Line 182: To destroy a virtualenv, simply delete its directory: To destroy a virtualenv, simply delete its directory: - $rm ENV +$ rm  Line 156: Line 197: If you are using any other version of Python, virtualenvwrapper is already installed and you just need to If you are using any other version of Python, virtualenvwrapper is already installed and you just need to - $echo 'source /cluster/Apps/Python//bin/virtualenvwrapper.sh' >> ~/.bashrc +$ echo 'source /cluster/easybuild//software/lang/Python//bin/virtualenvwrapper.sh' >> ~/.bashrc Line 163: Line 204: ====== Load Environment Modules (module load [mod]) ====== ====== Load Environment Modules (module load [mod]) ====== To load environment modules in python: To load environment modules in python: + execfile('/usr/share/Modules/init/python.py') execfile('/usr/share/Modules/init/python.py') - module('load','gcc/4.8.2') + module('load',) - module('load','software/bioinf/samtools/0.1.19') + - From Python 3.4.1 onwards we enabled on mogon a //modules// module ;-), e.g. + ====== Job submission ====== + + Like with other interpreted languages, you can indicate to the desired language for interpreting the script using a [[https://en.wikipedia.org/wiki/Shebang_(Unix)|shebang]]. Here is an example script. Obviously, you can adapt the ''submit()''-function for your needs (e.g. add logging functionality, account better / differently for multithreading, etc.): - import modules + #!/bin/env python - modules.module('load', 'Java/jdk1.8.0_25') + + #SBATCH -p nodeshort + #SBATCH -A + #SBATCH -N1 + #SBATCH -n 32 # assuming 2-threaded daughter processes + # otherwise specify do not '-c' + # (will be set to 1, implicitely) + #SBATCH -c 2  # number of cores per task, e.g. 2 threads + #SBATCH -t 10 + #SBATCH -J python-demo + #SBATCH -o python-demo.%j.log + + import subprocess + import shlex + import locale import os import os - os.environ['JAVA_HOME'] # will be '/cluster/Apps/Java/jdk1.8.0_25' + import glob + + def submit(call, ignore_errors = False): + n_threads = os.environ['SLURM_CPUS_PER_TASK'] + os.environ['OMP_NUM_THREADS'] = n_threads + if int(n_threads) > 1: + call = 'srun -n 1 -c %s --hint=multithread --cpu_bind=q %s' % (n_threads, call) + else: + call = 'srun -n 1 %s' % call + call = shlex.split(call) + process = subprocess.Popen(call, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + out, err = process.communicate() + out = out.decode(locale.getdefaultlocale()[1]) + err = err.decode(locale.getdefaultlocale()[1]) + if (not ignore_errors) and (process.returncode): + print("call failed, call was: %s" % ' '.join(call)) + print("Message was: %s" % str(out)) + print("Error code was %s, stderr: %s" % (process.returncode, err)) + return process.returncode, out, err + + if __name__== '__main__': + print(os.getcwd()) + for fname in glob.glob('*.input'): + call = "your application --threads=2 --infile=%s" % fname + submit(call) - This, of course, requires an environment, where the ''--system-site-packages''-option has been employed during the set up of your (currently active) python environment. - ====== Job submission ====== + For multinode scripts, ensure that the environment is set remotely (for most cases ''srun'' takes care of it). - For python you can use the maybe basic but friendly bsub package from: https://github.com/brentp/bsub + + Scripts employing ''mpi4py'' should not submit themselves. Scripts employing Python's onboard ''multiprocessing'' module do not need the ''submit()''-function, obviously. + + ====== Multiprocessing ====== + + + Smaller numbers of tasks can be divided amongst workers on a single node. In high level languages like Python, or in lower level languages using threading language constructs such as [[https://www.openmp.org/|OpenMP]], this can be accomplished with little more effort than a serial loop. This example also demonstrates using Python as the script interpreter for a Slurm batch script, however note that since Slurm copies and executes batch scripts from a private directory, it is necessary to manually add the runtime directory to the Python search path. - from bsub import bsub + #!/bin/env python - BAM2FQ = "bam2fq --input %s .." + #SBATCH --job-name=multiprocess - STAR = "star --align .." + #SBATCH --output=logs/multiprocess_%j.out - SAM2BAM = "samtools view .." + #SBATCH --time=01:00:00 - for dataset in datasets: + #SBATCH --partition=parallel  # Mogon II - bam2fq = bsub("bam2fastq", R='span[hosts=1] affinity[core(1)]', app='Reserve1G',  n=1, q='long', W='2:00' ) + #SBATCH --partition=nodeshort # Mogon I - bam2fq = bam2fq( BAM2FQ % dataset ) + #SBATCH --account= - star = bam2fq.then(   STAR,    job_name="STAR_%s" % dataset,    R='span[hosts=1] affinity[core(6)]', app='Reserve30G', n=1, q='long', W='8:00' ) + #SBATCH --nodes=1 - sam2bam = star.then(  SAM2BAM, job_name="SAM2BAM_%s" % dataset, R='span[hosts=1] affinity[core(1)]', app='Reserve10G', n=1, q='long', W='3:00' ) + #SBATCH --exclusive - print "First job_id:" + bam2fq.job_id + - print "Last job_id:" + sam2bam.job_id + import multiprocessing - last = sam2bam.job_id + import sys - + import os - print "still running? %s" % ( "yes" if bsub.poll(last) else "no" ) + + # necessary to add cwd to path when script run + # by slurm (since it executes a copy) + sys.path.append(os.getcwd()) + + def some_worker_function(some_input): pass + + # get number of cpus available to job + ncpus = int(os.environ["SLURM_JOB_CPUS_PER_NODE"]) + + # create pool of ncpus workers + pool = multiprocessing.Pool(ncpus) + + # apply work function in parallel + pool.map(some_worker_function, range(100)) - ====== Things to consider ====== + ===== MPI ===== - Python is an interpreted language. As such it should not be used for lengthy runs in an HPC environment. Please use the availability to compile your own modules with Cython; consult the relevant [[http://cython.org/|Cython documentation]]. If you do not know how to start, attend a local Python course or schedule a meeting at our local HPC workshop. - ====== Special packages ====== + Process and threaded level parallelism is limited to a single machine. To - Please note that we have already installed numpy, scipy and matplotlib in the versions of Python that we provide additionally. + + #!/bin/env python - ===== NumPY ===== + #SBATCH --job-name=mpi + #SBATCH --output=logs/mpi_%j.out + #SBATCH --time=01:00:00 + #SBATCH --partition=parallel  # Mogon II + #SBATCH --partition=nodeshort # Mogon I + #SBATCH --ntasks=128 # e.g. 2 nodes on Mogon I + + from mpi4py import MPI + + def some_worker_function(rank, size) + + comm = MPI.COMM_WORLD + rank = comm.Get_rank() + size = comm.Get_size() + + some_worker_function(rank, size) + + + MPI programs and Python scripts must be launched using mpirun as shown in this Slurm batch script: + + + #!/bin/bash + + #SBATCH --job-name=mpi + #SBATCH --output=logs/mpi_%j.out + #SBATCH --time=01:00:00 + #SBATCH --partition=parallel  # Mogon II + #SBATCH --partition=nodeshort # Mogon I + #SBATCH --account= + #SBATCH --ntasks=100 + + module load + + mpirun python mpi_pk.py + + + In this case we are only using MPI as a mechanism to remotely launch tasks on distributed nodes. All processes must start and end at the same time, which can lead to waste of resources if some job steps take longer than others. + + + + ====== Performance Hints ====== + + Many of the hints are inspired by [[https://d.cxcore.net/Python/Python_Cookbook_3rd_Edition.pdf|O'Reilly's Python Cookbook chapter on performance (Chapter 14)]]((As the link is frequently broken, please report, when this happens - apparently O'Reilly does not like seeing it online elsewhere, but had it online for free in the past.)). We only discuss very little here explicitly, it is worth reading this chapter. If you need help getting performance out of Python scripts contact us. + + ===== Profiling and Timing ===== + + Better than guessing is to profile, how much time a certain program or task within this program takes. Guessing bottlenecks is a hard task, profiling often worth the effort. The above mentioned Cookbook covers this chapter. + + ===== Regular Expressions ===== + + Avoid them as much you can. If you have to use them, compile them, prior to any looping, e.g.: + + import re + myreg = re.compile('\d') + for stringitem in list: + re.search(myreg, stringitem) + # or + myreg.search(stringitem) + + + ===== Use Functions ===== + + A little-known fact is that code defined in the global scope like this runs slower than code defined in a function. The speed difference has to do with the implementation of local versus global variables (operations involving locals are faster). So, if you want to make the program run faster, simply put the scripting statements in a function (also: see [[http://chimera.labs.oreilly.com/books/1230000000393/ch14.html#_problem_239|O'Reilly's Python Cookbook chapter on performance]]). + + The speed difference depends heavily on the processing being performed. + + + ===== Selectively Eliminate Attribute Access ===== + + Every use of the dot (.) operator to access attributes comes with a cost. Under the covers, this triggers special methods, such as ''__getattribute__()'' and ''__getattr__()'', which often lead to dictionary lookups. + + You can often avoid attribute lookups by using the ''from module import name'' form of import as well as making selected use of bound methods. See the illustration in [[http://chimera.labs.oreilly.com/books/1230000000393/ch14.html#_problem_239|O'Reilly's Python Cookbook chapter on performance]]. + + ===== Too many print statements ===== + + To avoid constant flushing (particularly in Python 2.x) and use buffered output instead, either use Python's ''logging'' module instead as it supports buffered output. An alternative is to write to ''sys.stdout'' and only flush in the end of a logical block. + + In Python 3.x the ''print()''-function comes with a keyword argument ''flush'', which defaults to ''False''. However, use of the logging module is still recommended. + + ===== Working with Scalars in Numerics Code ===== + + Any constant scalar is best not calculated in any loop - regardless of the programming language. Compilers might(!) optimize this away, but are not always capable of doing so. + + One example (timings for the module ''tools/IPython/6.2.1-foss-2017a-Python-3.6.4'' on Mogon I, results on Mogon II may differ, the message will hold): + + Every trivial constant is re-computed, if the interpreter is asked for this: + + + In [1]: from math import pi + + In [2]: %timeit [1*pi for _ in range(1000)] + ...: + 149 µs ± 6.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) + + In [3]: %timeit [pi for _ in range(1000)] + 87.1 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) + + The effect is more pronounced, if division is involved((for compiled functions, particularly - in interpreted code, as shown here, the effect is limited as every number is a Python-Object, too)): + + In [4]: some_scalar = 300 + + In [5]: pi_2 = pi / 2 + + In [6]: %timeit [some_scalar / (pi / 2) for _ in range(1000)] + 249 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) + + In [7]: %timeit [some_scalar / pi_2 for _ in range(1000)] + 224 µs ± 5.62 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) + + + Solution: Some evaluations are best placed outside of loops and bound to a variable. + + ===== Compile Code!!! ===== + + Remember that every Python Module on Mogon comes with [[http://cython.org/|Cython]]. Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language. + + While we cannot give a comprehensive intro in this wiki document, we recommend using Cython whenever possible and give this little example: + + Imaging you have a (tested) script, you need to call frequently. Then create modules your main script can import and write a setup script like this: + + # script: setup.py + #!/usr/bin/env python + + import os + from distutils.core import setup + from distutils.extension import Extension + from Cython.Distutils import build_ext + + named_extension = Extension( + "name of your extension", + ["directory_of_your_module/.pyx", + "directory_of_your_module/.pyx"], + extra_compile_args=['-fopenmp'], + extra_link_args=['-fopenmp'], + include_path = os.environ['CPATH'].split(':') + ) + + setup( + name = "some_name", + cmdclass = {'build_ext': build_ext}, + ext_modules = [named_extension] + ) + + + Replace ''named_extension'' with a name of your liking, and fill-in all place holders. You can now call the setup-skript like this: + + \$ python ./setup.py build_ext --inplace + + This will create a file ''directory_of_your_module/.c'' and a file ''directory_of_your_module/.so'' will be the result of a subsequent compilation step. + + In Cython you can release the global interpreter lock (GIL), see [[http://docs.cython.org/src/userguide/external_C_code.html|this document (scroll down a bit)]], when not dealing with pure python objects. + + In particular [[http://docs.cython.org/src/userguide/numpy_tutorial.html|Cython works with ''numpy'']]. + + ===== Memory Profiling ===== + + Profiling memory is a special topic on itself. There is, however, the Python module [[https://pypi.python.org/pypi/memory_profiler|"memory profiler"]], which is really helpful if you have an idea where to look. There is also [[https://pypi.python.org/pypi/Pympler|Pympler]], yet another such module. + ====== Things to consider ====== + + Python is an interpreted language. As such it should not be used for lengthy runs in an HPC environment. Please use the availability to compile your own modules with Cython; consult the relevant [[http://cython.org/|Cython documentation]]. If you do not know how to start, attend a local Python course or schedule a meeting at our local HPC workshop. - http://www.numpy.org/ - When installing NumPY, the first installation attempt fails at exit. Don't worry, the installation is already finished then, but to be sure, you can simply run the command again to see it exiting cleanly. - Note that NumPY can also be linked against the [[software:mkl|Intel Math Kernel Library]] or the [[software:acml|AMD Core Math Library]]: - * MKL: http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl - * ACML: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062309.html