This is an old revision of the document!
NGS Read Mapping Software on Mogon
This page is currently under construction.
As a first introduction into NGS alignment software tools we recommend reading this short blog post. Or in other words: It might be, that the list of supported tools grows and grows, due to your requests, but will never really cover everybody's favorite tool.
Notwithstanding, own benchmarks a first impression can be found in the same blog.
Software Options
BWA
BWA is one mapping tool, particularly to map “low-divergent sequences against a large reference genome”. Modules on Mogon can be found as1):
bio/BWA
The Wrapper Script
The wrapper script is not installed, yet.
To leverage the task from 1 (or a few) samples to be mapped to several in parallel, we provide a wrapper script, which is available as a module:
bio/parallel_BWA
The code is under version management and hosted internally, here.
The wrapper script will submit a job, it is not intended to be just within a SLURM environment, but rather creates one.
Calling parallel_BWA -h
will display a help message with all the options, the script provides. Likewise, the call parallel_BWA –credits
will display credits and a version history.
The script, after loading the module, can then be run like:
$ parallel_BWA [options] <referencedir> <inputdir>
Limitations:
- The wrapper recognizes FASTQ files with suffixes “
*.gz
”, “*.fastq
” or “*.fq
” and will allways assume FASTQ files (compressed or uncompressed). - The number of processes (and therefore nodes) is limited to the number of samples.
- The wrapper only works for paired end sequencing data, where the file tuples are designated with the following strings “
_1
” and “_2
” or “_R1
” and “_R2
”, respectively. - BWA does not scale well to big data. It is better to split input to chuncks of ~1GB
- BWA does not scale well beyond a NUMA block (8 threads on Mogon I)
- There are only a few options, as internally the wrapper calls
bwa mem
and only sets up a few things to yield performance.
About Arguments:
referencedir
needs to be the (relative) path to a directory containing an indexed BWA referenceinputdir
needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the stringunpaired
are ignored; this is to support preprocessing with the trimmomatic module.
The options:
parallel_BWA
attempts to deduce your SLURM account. This may fail, in which case-A, –account
needs to be supplied.-N,–nodes
allows to reserve more than 1 node (the default). This may speed up the screening; see the limitations above.-d,–dependency
, list of comma separated jobids, the job will wait for to finish-l,–runlimit
, this defaults to 300 minutes.-p,–partition
, the default isnodeshort
orparallel
on Mogon2, no smp-partition should be choosen.-t,–threads
, BWA can work in parallel. Please consult the manual. The default is 8.-o,–outdir
output directory path (default is the current working directory)
BarraCuda
The Wrapper Script
Razer3
The Wrapper Script
Bowtie2
Bowtie2 is a well known read aligner with a focus on gapped alignments.
As preliminary scaling tests indicate that the program can scale to a full node and is still reasonably fast, no wrapper script has been installed as a module, so far2). Instead, a few samples are given:
A Sample Script
STAR
The Wrapper Script
segemehl
segemehl seems to be a pretty good alignment tool, mentioned here, due to the blog which is cited below.
There will be no wrapper script for segemehl
: If this comparison bears any truth, the software might be really good. But also pretty memory hungry. And several tens GB / core is just too mutch. If you want to try segemehl, be sure to write your own wrapper script (perhaps stage-in the reference to a local scratch, not the ramdisk) and reserve sufficient memory. Be aware that you will be accounted for the pro-longed run time and memory.
Comparison Benchmarks
This part needs some more time to be finished ….