software:topical:lifescience:genome_assembly

This is an old revision of the document!


Genome Assembly

canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing. It is available as a module:

bio/canu

Quick Start Example - Escherichia coli K12

We will briefly explain here how to submit the quick start example from Canu to Mogon I.

Download the P6-C4 chemistry released by Pacific Biosciences with

$ curl -L -o pacbio.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq

to your desired directory and use the following batch-script

CanuQuickStart.slurm
#!/bin/bash
 
#SBATCH -J canuTest              # Job name
#SBATCH -o canuTestLog.%j.out    # Specify stdout output file (%j expands to jobId)
#SBATCH -p nodeshort             # Partition name ('parallel' on Mogon II)
#SBATCH -N 1                     # Total number of nodes requested (64 cores/node per Mogon I node)
#SBATCH -c 64                    # Total number of cores for the single task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours                                                                                                                                                                              
#SBATCH -A <account>             # Specify allocation to charge against
##SBATCH --mem=<value>           # optional: remove comment, if more than the partition default is required
 
# Loading modules: 
module load bio/canu/1.6-foss-2017a
 
# Launch the executable 1 times
srun canu -p  ecoli -d ecoli-pacbio genomeSize=4.8m -pacbio-raw pacbio.fastq maxThreads=64 useGrid=false 

The script will lauch one executable with 64 threads using the entire memory of a node. Canu will auto-detect the available computational resources and scale itself to it. If needed Canu can be restricted to utilize only a certain amount of memory with maxMemory=<amount in GiB>

The MaSuRCA MaSuRCA (Maryland Super Read Cabog Assembler) assembler claims to combine the benefits of deBruijn graph and Overlap-Layout-Consensus assembly approaches.

The modules are available as:

bio/MaSuRCA

Platanus is available as a module:

bio/Platanus

We do have a module for Trinity: bio/Trinity.

However:

Trinity is not a single piece of software, but rather three consecutive programs1). As those come with different demands on resources it would be a waste of time and faireshare, also a throttled run in itself, if run as-is. In the LSF-times we had an un-announced wrapper script. Un-announced because nobody ever cared to approach us, after it was established for a particular group.

In case you contemplate using Trinity, please approach us and we will re-establish this script with adaptions for SLURM.

Why do we do not it right now? It would require time, data and cooperation (feedback).

SPAdes is available as a module:

bio/SPAdes

Velvet2) is available as modules adhering to this name scheme: bio/Velvet/<version>-<toolchain>-<kmer-info>.

As the kmer size to be used is hard-compiled, it is to be selected by the respective module.

There is no workflow integration, yet. Due to the lack of feedback support is restricted to SaaS. If you like to change that or need further support, please approach us - bearing in mind that support is limited to the available man power.


1)
hence the name, in case you wondered
2)
Development seems to have ceased.
  • software/topical/lifescience/genome_assembly.1570650962.txt.gz
  • Last modified: 2019/10/09 21:56
  • by meesters