In the past, we've used an Illumina GA for our RNA-seq experiments. In general, we noticed that the reported quality of the read bases was highest at the 5' end of each read, and the quality dropped gradually towards the 3' end (as per the FASTQ files). This is what we expected.
Recently, however, we've received an RNA-seq dataset generated from a HiSeq 2000, and notice a different pattern. The 5' bases have a high quality, but the quality actually improves in the 3' direction until about base 20 (out of 90), and then drops gradually.
Can someone perhaps comment on whether this alternative pattern is just a harmless artifact of the HiSeq 2000, or if it should be a cause for concern?
Thanks.
Just wanted to add that we've also seen the same pattern -- something like a upside-down-smile (aka. a frown), where something like bases 1-4, 5-9, 10-14 increase in a step-like fashion, then a "normal" phred like distro is seen where we have a gradual/slight decrease in scores towards the 3' direction. We're doing 50 bp runs, and the median score out at base 50 is still ~ 36 (out of 40), so ... all in all, it's still quite good for us.
@steve: We also see similar pattern; 1-3, 4-8, 9-10, increase stepwise, then gradual increase upto 50-60bp and then slowly decreases till 3' end. we are running 104bp. but over all read qualities are good (median scores >32).