start:software:topical:lifesciences:ngs:software_collections

The SeqAn Library and Applications

What is SeqAn? To quote the webpage:

SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.

Based on SeqAn multiple applications have been developed (see the this section of the webpage for a comprehensive overview).

Loading the module

bio/SeqAn

will add those to the path.

TODO

There are no wrapper scripts in place. So, when launching them concurrently please be sure to avoid I/O contention (e.g. stage-in data to a local disk or ramdisk). You can approach us, if you want to integrate any of those tools into your workflow.

Developers best clone the current version. See the docs for further installation and API instructions.

BBMap short read aligner, and other bioinformatic tools

What is BBMap? To quote the webpage:

This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher). All tools are efficient and multithreaded. BBMap: Short read aligner for DNA and RNA-seq data. Capable of handling arbitrarily large genomes with millions of scaffolds. Handles Illumina, PacBio, 454, and other reads; very high sensitivity and tolerant of errors and numerous large indels. Very fast. BBNorm: Kmer-based error-correction and normalization tool. Dedupe: Simplifies assemblies by removing duplicate or contained subsequences that share a target percent identity. Reformat: Reformats reads between fasta/fastq/scarf/fasta+qual/sam, interleaved/paired, and ASCII-33/64, at over 500 MB/s. BBDuk: Filters, trims, or masks reads with kmer matches to an artifact/contaminant file. …and more!

Loading the module

bio/BBMap

will add those to the path.

Two Words of Caution

  1. The BBMap project is entirely written in Java. This does not necessarily play well with batch system. Please have a look on our hints to Java or approach us and report specific issues1).
  2. Mapping software can cause I/O contention. We can check your jobscript, if in doubt.

No Wrappers in Place

There are no wrapper scripts in place. So, when launching them concurrently please be sure to avoid I/O contention (e.g. stage-in data to a local disk or ramdisk). You can approach us, if you want to integrate any of those tools into your workflow.

The NCBI SRA (Sequence Read Archive)

The SRA toolkit is a collection of tools for using data in the INSDC Sequence Read Archives.

Using the toolkit is documented in a comprehensive handbook.

Modules on MOGON

The toolkit is available as a module:

bio/SRA-Toolkit/<version/build>

Loading just bio/SRA-Toolkit will load the most recent version.

We highly recommend choosing a version >= 2.10 due to the simpler configuration management, which in parts is incompatible with version lower than this.

Choosing the Storage Path(s)

Among other possible configuration settings, the SRA toolkit lets you choose the storage path, where so-called dumps will store their files:

$ vdb-config --interactive

will open an interactive dialogue, where you can choose the so-called location of user-repository in the CACHE tab.

This user repository can either be in your $HOME or your projects path. As the stored amount of data can reach considerable space and particularly this can be considered a project's shared data, we recommend choosing the project path. This be any one of:
  • On MOGON I select /project/<project-name>/<optional sub-path>.
  • On MOGON II select /lustre/project/<project-name>/<optional sub-path> or a project path in the scratch file system.

Users with projects on both systems may change this setting at any time.

Also, vdb-config lets you set the download path for prefetched files: in the TOOLS tab you can select either your home directory or the current directory. For the same two reasons as for the prefetch path, we recommend selecting the current working directory and to navigate to the actual project prior to the dumping command.


1)
Issues with regard to the application itself should be reported to the developer
  • start/software/topical/lifesciences/ngs/software_collections.txt
  • Last modified: 2020/10/02 14:44
  • by jrutte02