Hi
I have some Illumina RNA seq data, 75bp paired end reads. I've run FastQC & trimmomatic and then re-run FastQC. Both before and after trimmomatic quite a few of my sequences fail the per base sequence content with the %A content running low in read 1 (e.g. http://i.imgur.com/Cv4HXJq.png) and the %T running low in read 2 (e.g. http://i.imgur.com/ul7v55H.png). Read 2 tends to be a warning rather than a fail.
Per base quality is good across the board, per sequence GC content looks fine and the only other fails I get are duplicated sequences and kmer content (both of which I gather are common fails in mRNA seq data).
I've little experience with RNA seq QC - is this something to be concerned about?
Thanks for any advice.
Best,
Iain
Ah - strand specific. So obvious now you've pointed that out. Thanks. Can't see any poly A or oligo T in the kmers.
If this answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. This helps future readers of your question to evaluate the response.
I was looking at a fungal genome once that seemed to consist of only 3 letters on each strand. IIRC, one strand was only A, C, and G and the other was only C, G, and T (at least, in the areas I looked at). So if as Harold suggests your data is strand-specific, that particular organism might show this kind of pattern, if one of the two strands was generally coding and the other wasn't.