Hello,
I am currently working with Fastq files of exome sequencing with a coverage of 150x. After running FastQC tool on these files I observe quite high Quality Score values (~35 on average) with very narrow distribution across all positions. This seems a little suspicious to me. Since I have very little experience on this type of data I'd like to ask, Is it normal to observe such high Quality Score results??
Thank you for your help
That is normal. You can even see that in the example FastQC report: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html
Thanks a lot, I hadn't noticed this same tendency in the example report
You can analyze the quality scores empirically if you want, via mapping; BBMap has several options for that:
mhist generates a histogram of matches and mismatches by base position; qhist gives claimed and measured quality per position; and qahist gives the quality-score accuracy (claimed versus observed). Sometimes the quality scores are quite accurate, sometimes not; it depends on a lot of factors including luck. But if you suspect they are wrong, it's nice to validate that.
Note that humans, being diploid with a roughly 1/1000 SNP rate, have a noise floor of around 30dB for these testing methods - they work better on haploids. But they will still be fairly accurate up to Q30.
That is not surprising, if the libraries are of good quality (and the read length is not > 150).
Suspicious data you say? ಠ_ರೃ
Can we see the clues too?
It depends on which sequencing technology you have used. If your data is from Illumina HighSeq, I would say the quality is as expected. But if your data is from Nanopore, I would also think it is suspicious.