What Is The Minimum Quality Score
1
1
Entering edit mode
13.0 years ago
Love ▴ 100

Hello, I just created a fastq file from samtools. Now I want to do some kinds of quality things since I am not confident the quality. So I found fastx toolkit. Two questions:


Use command FASTQ Quality Filter

fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE]

       [-q N]       = Minimum quality score to keep.
   [-p N]       = Minimum percent of bases that must have [-q] quality.

What are the proper values of "N"?


Before running this command, should we convert the multi line format fastq to the standard fastq format by using fastaformatter or just fastqquality_filter can handle multi line fastq file?

fastx fastq quality • 13k views
ADD COMMENT
2
Entering edit mode
13.0 years ago

Dont' convert the file to fasta since you would lose the quality values that way.

The toolkit should be able to handle multiline fastq just fine. Make sure to pass -Q 33 flag (to set the right offset for Sanger quality scores, this is an undocumented feature sadly)

Quality values range from 0 to about 40 and reflect the chance of a base being called incorrectly. Now these values are more of a estimates rather than being actual measurements.

Usually qualities over 30 are considered good and under 20 are considered bad. At this point you need to come up with your own condition of how many bases should be allowed to be good/bad for a read to pass quality control In general it pays to be stricter than loose with your quality control, but this depends on the application.

For denovo assembly and SNP calling it is more important to keep good reads than say a Chip-Seq application where there is reference sequence and what people need are just a coordinate of the read.

ADD COMMENT
0
Entering edit mode

Note that from the definition of Phred scores (i.e. quality values), a score of 0 means a probability of 100% of the base call being incorrect. So assigning random bases have a score of about 1.2, and a score of 3 means a 50% chance of begin correct. Just in case you feel pedantic today. :-) I think Illumina uses these very low scores to signal special cases, e.g. a run of 'B's indicate the end of the clone.

ADD REPLY
0
Entering edit mode

So usually what is Minimum percent of bases that must have [-q] quality according your experience?

ADD REPLY
0
Entering edit mode

that is a decision you have to make Love - the best is to try out a few and see what type of effect it has on your data. Also see this.

ADD REPLY

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6