Fastq Quality Filter Parameters
1
0
Entering edit mode
11.8 years ago
madbessoul ▴ 250

I've been working on NGS data quality control (on Whole Genome Bisulfite Sequencing data of IMR90 stem cells, to be accurate) using the FASTX Toolkit. I am struggling at finding the optimal parameters for the fastqqualityfilter command-line, which is part of my QC pipeline, and documented as follows :

$ fastq_quality_filter -h
    usage: fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE]

version 0.0.6
   [-h]         = This helpful help screen.
   [-q N]       = Minimum quality score to keep.
   [-p N]       = Minimum percent of bases that must have [-q] quality.
   [-z]         = Compress output with GZIP.
   [-i INFILE]  = FASTA/Q input file. default is STDIN.
   [-o OUTFILE] = FASTA/Q output file. default is STDOUT.
   [-v]         = Verbose - report number of sequences.
          If [-o] is specified,  report will be printed to STDOUT.
          If [-o] is not specified (and output goes to STDOUT),
          report will be printed to STDERR.

Any clues about a way to find out (given quailty statistics) the optimal value for -p and -q ?

Thank you !

ngs qualitycontrol fastq • 6.6k views
ADD COMMENT
0
Entering edit mode
11.8 years ago

I recommend to solve the problem backwards.

First plot your quality values for the unfiltered data, then looking at those plots you can evaluate the effects of various filtering options will have.

You can use FastQC or any other tool to generate quality plots.

ADD COMMENT
0
Entering edit mode

Can you give any comments on what the gold standard is. For example, all reads needing to be above a Phred score of 32?

ADD REPLY

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6