Question

Fastq Quality Filter Parameters

0

Entering edit mode

11.8 years ago

madbessoul ▴ 250

I've been working on NGS data quality control (on Whole Genome Bisulfite Sequencing data of IMR90 stem cells, to be accurate) using the FASTX Toolkit. I am struggling at finding the optimal parameters for the fastqqualityfilter command-line, which is part of my QC pipeline, and documented as follows :

$ fastq_quality_filter -h
    usage: fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE]

version 0.0.6
   [-h]         = This helpful help screen.
   [-q N]       = Minimum quality score to keep.
   [-p N]       = Minimum percent of bases that must have [-q] quality.
   [-z]         = Compress output with GZIP.
   [-i INFILE]  = FASTA/Q input file. default is STDIN.
   [-o OUTFILE] = FASTA/Q output file. default is STDOUT.
   [-v]         = Verbose - report number of sequences.
          If [-o] is specified,  report will be printed to STDOUT.
          If [-o] is not specified (and output goes to STDOUT),
          report will be printed to STDERR.

Any clues about a way to find out (given quailty statistics) the optimal value for -p and -q ?

Thank you !

ngs qualitycontrol fastq • 6.6k views

ADD COMMENT • link updated 11.8 years ago by Istvan Albert 102k • written 11.8 years ago by madbessoul ▴ 250

score 0 · Answer 1 · 2013-02-22

0

Entering edit mode

11.8 years ago

Istvan Albert 102k

I recommend to solve the problem backwards.

First plot your quality values for the unfiltered data, then looking at those plots you can evaluate the effects of various filtering options will have.

You can use FastQC or any other tool to generate quality plots.

ADD COMMENT • link 11.8 years ago by Istvan Albert 102k

0

Entering edit mode

Can you give any comments on what the gold standard is. For example, all reads needing to be above a Phred score of 32?

ADD REPLY • link 6.0 years ago by tbb21 ▴ 10