Hello, I just created a fastq file from samtools. Now I want to do some kinds of quality things since I am not confident the quality. So I found fastx toolkit. Two questions:
Use command FASTQ Quality Filter
fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE]
[-q N] = Minimum quality score to keep.
[-p N] = Minimum percent of bases that must have [-q] quality.
What are the proper values of "N"?
Before running this command, should we convert the multi line format fastq to the standard fastq format by using fastaformatter or just fastqquality_filter can handle multi line fastq file?
Note that from the definition of Phred scores (i.e. quality values), a score of 0 means a probability of 100% of the base call being incorrect. So assigning random bases have a score of about 1.2, and a score of 3 means a 50% chance of begin correct. Just in case you feel pedantic today. :-) I think Illumina uses these very low scores to signal special cases, e.g. a run of 'B's indicate the end of the clone.
So usually what is Minimum percent of bases that must have [
-q
] quality according your experience?that is a decision you have to make Love - the best is to try out a few and see what type of effect it has on your data. Also see this.