Hi!
A company sequenced 20 different samples for our lab on MiSeq platform (2x300). Now I got the data and did a first rough evaluation of the quality. I have huge differences in quality of R1 reads vs. R2. Is that normal (see images)?
The two images show first the R1 reads an then the R2 reads. I averaged the prob. for a wrong basecall over all reads of one sample.
On the y-axis you can see the probabilty of a wrong base-call and on the x-axis the base pos. Be aware of the fact that the two y-axes do not have the same range (otherwise one would not see anything in R1 graph).
Important: At graph for R2 the x-axis is flipped (Did this already in a script to align them later...)
http://picpaste.com/pics/R1.1398695966.png
http://picpaste.com/pics/R2.1398696015.png
Thanks!
Michael
OK, here the phred scores. This time the x-axis of R2 is not flipped. What do you guys think? Is this a satisfying result considering that sequencing is done by professionals?
http://picpaste.com/pics/R1_phred.1398701258.png
http://picpaste.com/pics/R2_phred.1398700926.png
Two additional infos:
First, the distribution of phred scores (Image 1: R1, Image 2: R2). Here I have just counted the occurrence of phred scores per sample:
http://picpaste.com/pics/R1_Phreds_1.1398765452.png
http://picpaste.com/pics/R2_Phreds_1.1398765471.png
Second, I counted how many reads pass the following criteria:
150bp with at least a average phred score of 20 in a moving window of 3 bases (I guess that is not very strict criteria).
For R1 90% to 95% of the reads pass
For R2 just about 0.01% pass
I usually do see differences between read 2 and read 1 of a pair in terms of quality, but usually not that bad.
It would be interesting to see these charts only for bases with a quality > 20 or 30. Presumably, you axis is flipped on the R2.
@ brentp: Yes sure! Stupid me, forgot to mention that... But still the R2 reads are much worse compared to R1 reads on the whole lenght... Guess that is still strange?
As Dan Gaston already said it is normal to see some difference, but I assume also that this is more than what one would expect. It is hard to say because I normally look at the Phred score directly without computing error probabilities. When I have seen a big difference between read 1 and read 2 quality this normally also means that the second index read is really low quality with a lot of uncalled bases. Did you have index reads and how do they look?
I second previous comments, it's normal to have a small variation, R2 is in general a little bad compared to R1, but your plots shows that something happen in R2 sequencing. Ask for a re-run or ignore R2 in your analysis.
It doesn't seem that odd to me that at the end of 300 bases in the 2nd pair in untrimmed reads, the error rate is up to 10%. But I usually don't look at the data like this so maybe I'm mistaken.
I agree. I usually look at it plotted out in the same way as FastQC does.