User Tools

Site Tools


software:topical:lifescience:ngs_read_mapping_tools

This is an old revision of the document!


NGS Read Mapping Software on Mogon

This page is currently under construction.

As a first introduction into NGS alignment software tools we recommend reading this short blog post. Or in other words: It might be, that the list of supported tools grows and grows, due to your requests, but will never really cover everybody's favorite tool.

Notwithstanding, own benchmarks a first impression can be found in the same blog.

Software Options

BWA

BWA is one mapping tool, particularly to map “low-divergent sequences against a large reference genome”. Modules on Mogon can be found as1):

bio/BWA

The Wrapper Script

The wrapper script is only installed on Mogon II. Pending an update on the ramdisk plugin, it will be provided on Mogon I, too.

To leverage the task from 1 (or a few) samples to be mapped to several in parallel, we provide a wrapper script, which is available as a module:

bio/parallel_BWA

The code is under version management and hosted internally, here.

The wrapper script will submit a job, it is not intended to be just within a SLURM environment, but rather creates one.

Calling parallel_BWA -h will display a help message with all the options, the script provides. Likewise, the call parallel_BWA –credits will display credits and a version history.

The script, after loading the module, can then be run like:

$ parallel_BWA [options] <referencedir> <inputdir>

Limitations:

  • The wrapper recognizes FASTQ files with suffixes “*.gz”, “*.fastq” or “*.fq” and will allways assume FASTQ files (compressed or uncompressed).
  • The number of processes (and therefore nodes) is limited to the number of samples.
  • The wrapper only works for paired end sequencing data, where the file tuples are designated with the following strings “_1” and “_2” or “_R1” and “_R2”, respectively.
  • BWA does not scale well to big data. It is better to split input to chuncks of ~1GB
  • BWA does not scale well beyond a NUMA block (8 threads on Mogon I)
  • There are only a few options, as internally the wrapper calls bwa mem and only sets up a few things to yield performance.

About Arguments:

  • referencedir needs to be the (relative) path to a directory containing an indexed BWA reference
  • inputdir needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string unpaired are ignored; this is to support preprocessing with the trimmomatic module.

The options:

  • parallel_BWA attempts to deduce your SLURM account. This may fail, in which case -A, –account needs to be supplied.
  • -N,–nodes allows to reserve more than 1 node (the default). This may speed up the screening; see the limitations above.
  • -d,–dependency, list of comma separated jobids, the job will wait for to finish
  • -l,–runlimit, this defaults to 300 minutes.
  • -p,–partition, the default is nodeshort or parallel on Mogon2, no smp-partition should be choosen.
  • -t,–threads, BWA can work in parallel. Please consult the manual. The default is 8.
  • -o,–outdir output directory path (default is the current working directory)

Output:

  • Per input tuple (paired sequencing data, only) a sorted BAM file with the prefix of the input will be written.

Currently the wrapper supports a start like: bwa mem … | samtools view -Shb -o … with flags controlling parallelism. Additional flags would require to add more boilerplate code to the wrapper. See below for note on improving wrapper scripts.

BarraCuda

Barracuda is a GPU-accelerated implementation of BWA and can be found on Mogon as the module

bio/barracuda

It does not support bwa mem … but rather leverages bwa aln … to GPUs.

The Wrapper Script

Razer3

The Wrapper Script

Bowtie2

Bowtie2 is a well known read aligner with a focus on gapped alignments.

As preliminary scaling tests indicate that the program can scale to a full node and is still reasonably fast, no wrapper script has been installed as a module, so far2). Instead, a few samples are given:

A Sample Script

STAR

The Wrapper Script

segemehl

segemehl seems to be a pretty good alignment tool, mentioned here, due to the blog which is cited below.

There will be no wrapper script for segemehl: If this comparison bears any truth, the software might be really good. But also pretty memory hungry. And several tens GB / core is just too mutch. If you want to try segemehl, be sure to write your own wrapper script (perhaps stage-in the reference to a local scratch, not the ramdisk) and reserve sufficient memory. Be aware that you will be accounted for the pro-longed run time and memory.

Comparison Benchmarks

This part needs some more time to be finished ….

1)
loading a module without version specification will load the most recent one
2)
If you feel a workflow logic can profit from a wrapper, please approach us.
software/topical/lifescience/ngs_read_mapping_tools.1537456718.txt.gz · Last modified: 2018/09/20 17:18 by meesters