Hello there,
I am having a lot of MiSeq data I am trying to analyze and I figured out using FastQC that I have a lot of fails in the report it generates and I wanted to pick your brain to get sense of what should be done in that case.
As you all know FastQC generates this kind of information:
[PASS] Basic Statistics
[FAIL] Per base sequence quality
[PASS] Per sequence quality scores
[FAIL] Per base sequence content
[FAIL] Per base GC content
[WARNING] Per sequence GC content
[PASS] Per base N content
[WARNING] Sequence Length Distribution
[FAIL] Sequence Duplication Levels
[WARNING] Overrepresented sequences
[FAIL] Kmer Content
The issue here is that I am analyzing targeted sequencing data, so I am expecting to have a lot of duplications, what I don't clearly see is whether or not to take the result of FastQC as correct based on the standard they are publishing on their website (how a good report should look like and how a bad one should look like), so I am expecting the GC content to go crazy with the amount of duplication because of the type of experiment (deep sequencing), now based on the information provided in the example above, do you think fastq post processing like clipping and trimming would correct the reads or is it failing in the level of the MiSeq machine already (experimental contamination?)
Rad
Thanks Sean, What I do have indeed is a per base sequence quality dropping to < 28 on 200pb read around the position 100-120 bp so the flag that this raises is whether or not it is a primer problem or not, besides clipping the sequences to 50% of their length looks a bit brutal to me, don't you think?
I have seen only one run in 6 years that required clipping based on the base pair location. If you want to clip, do so based on quality, not base pair. Also, keep in mind that those plots need to be read carefully, as even pretty bad plots typically have a large proportion of the reads with perfectly acceptable quality scores.
I see,
Thanks for the tips, I am going to use the data but will play with the quality threshold at the alignment level
Thanks
Rad