software:topical:lifescience:local_search_tools

Local Alignment Searches

Comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences is an integral part – not alone – of phylogenetics and metagenomics.

BLAST+

BLAST+ from the NCBI (National Center for Biotechnology Information) is the “canonical” implementation of a local alignment search tool, well established and maintained.

We provide this software as modules under:

bio/BLAST+

You can find a wrapper to ease your workflow, below.

BLAT

BLAT is a alternative implementation and long time competitor of BLAST+. Particularly, it strives to be faster than BLAST+.

We provide this software as modules under:

bio/BLAT

Diamond

Diamond is yet another implementation, restricted to protein sequences it is considerably faster the blastp.

We provide this software as modules under:

bio/DIAMOND

Lambda

Lambda claims to operate faster and in a compatible fashion compared to blastn/p/x

We provide this software as modules under:

bio/lambda – the application is lambda2

We provide a wrapper module on Mogon in order to aggregate jobs and the searches.

The wrapper script is available as a module:

bio/parallel_BLAST

The code is under version management and hosted internally, here.

This wrapping module has - despite the fact, that is the oldest wrapper for life scientist on Mogon - serious shortcomings:

  • currently only support for BLAST+ applications
  • the output is not automatically merged to be conveniently analyzed in down-stream tools
  • reference databases are limited in their size: only those which fit in the RAM of a node can be analyzed.
  • the batch system abstraction reaches not as far as for the other modules, e.g. a user has to estimate the requested ramdisk size (see below) by herself
  • the wrapping module is currently only implemented for Mogon I.

If you need a tool resp. wrapper to reach beyond these limitations, get in touch with us.

Calling LA_Wrapper -h will display a help message with all the options, the script provides. Likewise, the call LA_Wrapper –credits will display credits and a version history.

The wrapper script will submit a job, it is not intended to be just within a SLURM environment, but rather creates one.

The script, after loading the module, can then be run like:

$ LA_Wrapper [options] <fastafile> <database>

About Arguments:

  • fastafile needs to be a (relative) path to a file containing all inputs in FASTA format.

The options:

  • LA_Wrapper attempts to deduce your SLURM account. This may fail, in which case -A, –account needs to be supplied.
  • -l,–runlimit, this defaults to 300 minutes.
  • –reservation, reservation to use (none is the default)
  • –time, time in minutes (300 is the default)
  • -r,–ramdisk, ramdisk size in units of GiB (default is 40 GiB)
  • -t,–threads, blast threads (default is 8)
  • –blastparams, blast parameters (default is -outfmt 5 (for xml output))
  • -s,–splitup, No. of FASTA sequences per query file (default is 20)
  • –blastdir, output directory (default is composition of input names)
  • –executable, choose executable (currently only from NCBI-BLAST, default: blastx)
  • –compress, if set, the output files will be merged and compressed (time consuming!, defaultt: off)
  • –test, –no-test, dry run, testing only (off by default)

Output:

  • One analysis per input as specified by the respective executable.
  • software/topical/lifescience/local_search_tools.txt
  • Last modified: 2019/03/13 14:04
  • by meesters