software:topical:lifescience:qa

This is an old revision of the document!


Quality Assessment of NGS Data

Checking the quality1) of experimental data is crucial to data analysis.

FastQC

One of the best-known tools for estimating the sequencing quality (and providing summary statistics and plots) is FastQC.

We provide this software as modules under:

bio/FastQC

You can find a wrapper to ease your workflow, below.

You can visualize the html-files created by FastQC with firefox, which is installed on both clusters.

MultiQC is an assessment tool to gather the output of multiple quality indicators and to visualize them.

It is available as a module file:

bio/MultiQC

It can be run in a job to complete a pipeline with a quality assessment.

We provide a wrapper module on Mogon in order to aggregate jobs and to integrate the quality check into a workflow.

The wrapper script is available as a module:

bio/parallel_QATools

The code is under version management and hosted internally, here.

The wrapper script will submit a job, it is not intended to be just within a SLURM environment, but rather creates one.

Calling QAWrapper -h will display a help message with all the options, the script provides. Likewise, the call QAWrapper –credits will display credits and a version history.

The script, after loading the module, can then be run like:

$ QAWrapper --executable=<executable> [options] <inputdir>

Different meanings of the selected executable

Obviously, fastqc is to determine the quality of the “raw” data in FASTQ format. Yet, other executables are available, such as samtools with their invocation on .bam file to summarize the quality of mapping tools.

See below for a detailed description. If a particular executable is not supported, you can approach us.

About Arguments:

  • inputdir needs to be a (relative) path to a directory containing all inputs. Subdirectories and files containing the string unpaired are ignored; this is to support preprocessing with the quality assessment module.

The options:

  • QAWrapper attempts to deduce your SLURM account. This may fail, in which case -A, –account needs to be supplied.
  • –executable, defaults to fastqc. Other options: samtools. Option is case-insensitive.
  • -l,–runlimit, this defaults to 300 minutes.
  • -p,–partition, the default is nodeshort or parallel on Mogon2, no smp-partition should be choosen.
  • –args, arguments otherwise not set by the wrapper - the defaults of the choosen executable apply for unset arguments
  • -d,–dependency, list of comma separated jobids, the job will wait for to finish
  • -o,–outdir, output directory path (default is the current working directory)
  • –constraint, on Mogon II, only: defaults to broadwell

Executables:

Selected Recognized Files Purpose Invocation
fastqc FASTQ files ending on .fastq, .fq or .gz assess quality of raw or trimmed data
samtools mapped .bam files assess quality of mapped (sorted or filtered) compressed data samtools flagstat

Output:

  • One analysis per input as specified by the respective executable.

1)
Quality checks or assessments are different to quality control or enhancement.
  • software/topical/lifescience/qa.1554280967.txt.gz
  • Last modified: 2019/04/03 10:42
  • by meesters