software:topical:lifescience:ngs_read_mapping_tools

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
software:topical:lifescience:ngs_read_mapping_tools [2018/12/13 12:24]
meesters
software:topical:lifescience:ngs_read_mapping_tools [2019/10/24 15:48]
meesters [BarraCuda]
Line 1: Line 1:
 ====== NGS Read Mapping Software on Mogon ====== ====== NGS Read Mapping Software on Mogon ======
  
-As a first introduction into NGS alignment software tools we recommend reading this short [[https://www.ecseq.com/support/ngs/what-is-the-best-ngs-alignment-software|blog post]]. Or in other words: It might be, that the list of supported tools grows and grows, [[https://hpc.uni-mainz.de/high-performance-computing/service-angebot/softwareinstallation/|due to your requests]], but will never really cover everybody's favorite tool.+As a first introduction into NGS alignment software tools we recommend reading this short [[https://www.ecseq.com/support/ngs/what-is-the-best-ngs-alignment-software|blog post]]. Or in other words: It might be, that the list of supported tools grows and grows, [[https://hpc.uni-mainz.de/high-performance-computing/service-angebot/softwareinstallation/|due to your requests]], but will never really cover everybody's favorite tool - there are just too many and some are just not worth having.
  
-Notwithstanding, own [[software:topical:lifescience:ngs_read_mapping_tools#Comparison_Benchmarks|benchmarks]] a first impression can be found in [[http://www.ecseq.com/support/benchmark.html|the same blog]]. 
 ===== Software Options ===== ===== Software Options =====
  
Line 12: Line 11:
 ''bio/BWA/<version>'' ''bio/BWA/<version>''
  
-=== The Wrapper Script ===+You can find a wrapper to ease your workflow, [[software:topical:lifescience:#standard_mappers|below]].
  
  
Line 25: Line 24:
 See [[:software:topical:lifescience:ngs_read_mapping_tools#gpu-based|below for a wrapper script]] to ease your workflow. See [[:software:topical:lifescience:ngs_read_mapping_tools#gpu-based|below for a wrapper script]] to ease your workflow.
  
 +==== Minimap2 ====
 +
 +[[https://github.com/lh3/minimap2|Minimap2]] is supposed to be a replacement for ''bwa mem''. Modules are installed under 
 +
 +''bio/minimap2''
  
  
Line 33: Line 37:
 ''bio/SeqAn/<version>'' ''bio/SeqAn/<version>''
  
-You can find a wrapper to ease your workflow, [[software:topical:lifescience:#standard_mappers|below]], eventually ((not yet)).+You can find a wrapper to ease your workflow, [[software:topical:lifescience:#standard_mappers|below]].
  
  
Line 39: Line 43:
  
 [[https://www.nature.com/articles/nmeth.1923|Bowtie2]] is a well known read aligner with a focus on gapped alignments. [[https://www.nature.com/articles/nmeth.1923|Bowtie2]] is a well known read aligner with a focus on gapped alignments.
 +
 +Module(s) can be found at:
 +
 +''bio/Bowtie2/<version>''
 +
 +You can find a wrapper to ease your workflow, [[software:topical:lifescience:#standard_mappers|below]].
  
 ==== STAR ==== ==== STAR ====
  
-<WRAP center round todo 65%>  +[[https://www.ncbi.nlm.nih.gov/pubmed/23104886|STAR]] is a well known mapping tool for RNA-Seq data.  
-More info soon-ish+ 
-</WRAP>+Module(s) can be found at: 
 + 
 +''bio/STAR/<version>'' 
 + 
 +You can find a wrapper to ease your workflow, [[software:topical:lifescience:#standard_mappers|below]].
  
 ==== segemehl ==== ==== segemehl ====
Line 58: Line 72:
 ''bio/segemehl/0.2.0-foss-2018a'' ''bio/segemehl/0.2.0-foss-2018a''
  
 +==== TopHat ====
 +
 +[[https://ccb.jhu.edu/software/tophat/index.shtml|TopHat]] is a fast splice junction mapper for RNA-Seq reads.
 +
 +Module can be found at:
 +
 +''bio/TopHat/<version>''
 +
 +
 +<WRAP center round info 90%>
 +This program is not yet incorporated into the wrapping module.
 +</WRAP>
 ==== yara ==== ==== yara ====
  
Line 90: Line 116:
  
 <code bash> <code bash>
-$ MapperWrapper [options] <referencedir> <inputdir>+$ MapperWrapper --executable=<executable> [options] <referencedir> <inputdir>
 </code> </code>
  
Line 105: Line 131:
  
   * ''referencedir'' needs to be the (relative) path to a directory containing an indexed BWA reference   * ''referencedir'' needs to be the (relative) path to a directory containing an indexed BWA reference
-  * ''inputdir'' needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string ''unpaired'' are ignored; this is to support preprocessing with the [[software:topical:lifescience:trimmomatic|trimmomatic module]].+  * ''inputdir'' needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string ''unpaired'' are ignored; this is to support preprocessing with the [[software:topical:lifescience:qc|quality check module]].
  
 The options: The options:
   * ''MapperWrapper'' attempts to deduce your SLURM account. This may fail, in which case ''-A, --account'' needs to be supplied.   * ''MapperWrapper'' attempts to deduce your SLURM account. This may fail, in which case ''-A, --account'' needs to be supplied.
   * ''--verbose,--no-verbose''  verbose execution (off by default)   * ''--verbose,--no-verbose''  verbose execution (off by default)
 +  * ''--executable''  mandatory argument to designate the executable possible arguments: ''bwa'', ''bowtie2'', ''yara''
   * ''-d,--dependency'', list of comma separated jobids, the job will wait for to finish   * ''-d,--dependency'', list of comma separated jobids, the job will wait for to finish
   * ''-l,--runlimit'', this defaults to 300 minutes.   * ''-l,--runlimit'', this defaults to 300 minutes.
   * ''-p,--partition'', the default is ''nodeshort'' or ''parallel'' on Mogon2, no smp-partition should be choosen.   * ''-p,--partition'', the default is ''nodeshort'' or ''parallel'' on Mogon2, no smp-partition should be choosen.
   * ''-o,--outdir'' output directory path (default is the current working directory)   * ''-o,--outdir'' output directory path (default is the current working directory)
 +  * ''--tag'' optional tag/prefix for logfiles and directories
 +  * ''--groups'' set to provide a lists of read group tags (len(groups) must equal to No. of files)
   * ''--single'' (no arguments) to evaluate single end data   * ''--single'' (no arguments) to evaluate single end data
   * ''--args'' to supply additional flags, e. g. ''--args="-l 1024 -n 0.02"'' for BWA - note the quotation marks, they are necessary.   * ''--args'' to supply additional flags, e. g. ''--args="-l 1024 -n 0.02"'' for BWA - note the quotation marks, they are necessary.
Line 120: Line 149:
  
   * Per input tuple (paired sequencing data, only) a BAM file with the prefix of the input will be written. In the case of single end data, there will be one output per input, only.   * Per input tuple (paired sequencing data, only) a BAM file with the prefix of the input will be written. In the case of single end data, there will be one output per input, only.
 +
 +=== Generating Read Group Tags ===
 +
 +Read group tags can be inserted with the ''--groups'' flag((From version 0.6 onward.)). The tags are supplied as a list on the command line. An example code to generate a tag list for consecutively ordered tags would be:
 +
 +<code bash>
 +# defining the input directory appropriately in a master script:
 +inputdir=/some/path/to/your/data # assuming '_R1' defines the forward reads in a paired end scenario
 +
 +# a template - may deviate from project to project
 +template="@RG\tID:+ID+\tLB:unknown_lb\tPL:illumina\tSM:sample+ID+"
 +# the tag list to be generated
 +tags=""
 +# number of samples - this snippet could be integrated in a script 
 +nsamples=$(find $inputdir -name '*_R1*.fastq' | grep -v unpaired | wc -l)
 +# now the actual generation:
 +for ((i=1; i <= $nsamples; i++)); do
 +  tags="$tags $(sed -e "s/+ID+/$i/g" <<< $template)"
 +done
 +</code>
  
  
Line 139: Line 188:
  
 <WRAP center round important 90%> <WRAP center round important 90%>
-**Limitations**: +**Considerations**: 
-  * See the parallel_BWA wrapper +  * See the [[software:topical:lifescience:ngs_read_mapping_tools#standard_mappers|"standard" Mappers]] 
-  * Also: The script will only use the ''m2_gpu'' partition and therefore needs an account with the ''m2_'' prefix.+  * Also: The script will only use the ''m2_gpu'' partition and therefore needs an account with the ''m2_'' prefix((This is because development to support the wild "zoo" of hardware and partition setting is hardly worth the effort for this software, as tests show that standard bwa (properly mapped) outperforms the gpu version.)).
 </WRAP> </WRAP>
  
Line 147: Line 196:
 About Arguments: About Arguments:
   * ''referencedir'' needs to be the (relative) path to a directory containing an indexed BWA reference. No symbolic links are allowed.   * ''referencedir'' needs to be the (relative) path to a directory containing an indexed BWA reference. No symbolic links are allowed.
-  * ''inputdir'' needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string ''unpaired'' are ignored; this is to support preprocessing with the [[software:topical:lifescience:trimmomatic|trimmomatic module]].+  * ''inputdir'' needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string ''unpaired'' are ignored; this is to support preprocessing with the [[software:topical:lifescience:qc|quality check module]].
  
  
Line 161: Line 210:
  
 ===== Comparison Benchmarks ===== ===== Comparison Benchmarks =====
 +
 +
 +Notwithstanding, own [[software:topical:lifescience:ngs_read_mapping_tools#Comparison_Benchmarks|benchmarks]] a first impression can be found in [[http://www.ecseq.com/support/benchmark.html|the same blog]].
  
 <WRAP center round todo 65%> <WRAP center round todo 65%>
 This part needs some more time to be finished .... This part needs some more time to be finished ....
 </WRAP> </WRAP>