software:topical:lifescience:ngs_read_mapping_tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:topical:lifescience:ngs_read_mapping_tools [2018/09/17 14:24]
meesters [BarraCuda]
— (current)
Line 1: Line 1:
-====== NGS Read Mapping Software on Mogon ====== 
  
-<WRAP center round todo 65%> 
-This page is currently under construction.  
-</WRAP> 
- 
-As a first introduction into NGS alignment software tools we recommend reading this short [[https://www.ecseq.com/support/ngs/what-is-the-best-ngs-alignment-software|blog post]]. Or in other words: It might be, that the list of supported tools grows and grows, [[https://hpc.uni-mainz.de/high-performance-computing/service-angebot/softwareinstallation/|due to your requests]], but will never really cover everybody's favorite tool. 
- 
-Notwithstanding, own [[software:topical:lifescience:ngs_read_mapping_tools#Comparison_Benchmarks|benchmarks]] a first impression can be found in [[http://www.ecseq.com/support/benchmark.html|the same blog]]. 
-===== Software Options ===== 
- 
-==== BWA ==== 
- 
-BWA is one mapping tool, particularly to map "low-divergent sequences against a large reference genome". Modules on Mogon can be found as((loading a module without version specification will load the most recent one)): 
- 
-''bio/BWA'' 
- 
-=== The Wrapper Script === 
- 
-<WRAP center round alert 90%> 
-The wrapper script is not installed, yet. 
-</WRAP> 
- 
-To leverage the task from 1 (or a few) samples to be mapped to several in parallel, we provide a wrapper script, which is available as a module:  
- 
-''bio/parallel_BWA'' 
- 
-The code is under version management and hosted [[https://gitlab.rlp.net/hpc-jgu-lifescience/seq-analysis|internally, here]]. 
- 
-<WRAP center round important 90%> 
-The wrapper script will submit a job, it is not intended to be just within a SLURM environment, but rather creates one. 
-</WRAP> 
- 
-Calling ''parallel_BWA -h'' will display a help message with all the options, the script provides. Likewise, the call ''parallel_BWA --credits'' will display credits and a version history. 
- 
-The script, after loading the module, can then be run like: 
- 
-<code bash> 
-$ parallel_BWA [options] <referencedir> <inputdir> 
-</code> 
- 
-<WRAP center round important 90%> 
-**Limitations**: 
- 
-  * The wrapper recognizes FASTQ files with suffixes "''*.gz''", "''*.fastq''" or "''*.fq''" and will allways assume FASTQ files (compressed or uncompressed). 
-  * The number of processes (and therefore nodes) is limited to the number of samples. 
-  * The wrapper only works for paired end sequencing data, where the file tuples are designated with the following strings "''_1''" and "''_2''" or "''_R1''" and "''_R2''", respectively. 
-  * BWA does not scale well to big data. It is better to split input to chuncks of ~1GB 
-  * BWA does not scale well beyond a NUMA block (8 threads on Mogon I) 
-  * There are only a few options, as internally the wrapper calls ''bwa mem'' and only sets up a few things to yield performance. 
-</WRAP> 
- 
-About Arguments: 
- 
-  * ''referencedir'' needs to be the (relative) path to a directory containing an indexed BWA reference 
-  * ''inputdir'' needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string ''unpaired'' are ignored; this is to support preprocessing with the [[software:topical:lifescience:trimmomatic|trimmomatic module]]. 
- 
-The options: 
-  * ''parallel_BWA'' attempts to deduce your SLURM account. This may fail, in which case ''-A, --account'' needs to be supplied. 
-  * ''-N,--nodes'' allows to reserve more than 1 node (the default). This may speed up the screening; see the limitations above. 
-  * ''-d,--dependency'', list of comma separated jobids, the job will wait for to finish 
-  * ''-l,--runlimit'', this defaults to 300 minutes. 
-  * ''-p,--partition'', the default is ''nodeshort'' or ''parallel'' on Mogon2, no smp-partition should be choosen. 
-  * ''-t,--threads'', BWA can work in parallel. Please consult the manual. The default is 8. 
-  * ''-o,--outdir'' output directory path (default is the current working directory) 
-   
-   
- 
-==== BarraCuda ==== 
- 
-[[http://seqbarracuda.sourceforge.net/|Barracuda]] is a GPU-accelerated implementation of BWA 
- 
-=== The Wrapper Script === 
- 
-==== Razer3 ==== 
- 
-=== The Wrapper Script === 
- 
-==== Bowtie2 ==== 
- 
-[[https://www.nature.com/articles/nmeth.1923|Bowtie2]] is a well known read aligner with a focus on gapped alignments. 
- 
-As //preliminary// scaling tests indicate that the program can scale to a full node and is still reasonably fast, no wrapper script has been installed as a module, so far((If you feel a workflow logic can profit from a wrapper, please approach us.)). Instead, a few samples are given: 
- 
-=== A Sample Script === 
- 
-==== STAR ==== 
- 
-=== The Wrapper Script === 
- 
-==== segemehl ==== 
- 
-[[https://www.ncbi.nlm.nih.gov/pubmed/24626854|segemehl]] seems to be a pretty good alignment tool, mentioned here, due to the blog which is cited below. 
- 
-<WRAP center round info 90%> 
-There will be no wrapper script for ''segemehl'': If this [[http://www.ecseq.com/support/benchmark.html|comparison]] bears any truth, the software might be really good. But also pretty memory hungry. And several tens GB / core is just too mutch. If you want to try segemehl, be sure to write your own wrapper script (perhaps stage-in the reference to a local scratch, not the ramdisk) and reserve sufficient memory. Be aware that you will be accounted for the pro-longed run time and memory.  
-</WRAP> 
- 
-===== Comparison Benchmarks ===== 
- 
-<WRAP center round todo 65%> 
-This part needs some more time to be finished .... 
-</WRAP> 
  • software/topical/lifescience/ngs_read_mapping_tools.1537187074.txt.gz
  • Last modified: 2018/09/17 14:24
  • by meesters