Question

FastQC low %A in per base sequence content

0

Entering edit mode

7.9 years ago

Duff ▴ 670

Hi

I have some Illumina RNA seq data, 75bp paired end reads. I've run FastQC & trimmomatic and then re-run FastQC. Both before and after trimmomatic quite a few of my sequences fail the per base sequence content with the %A content running low in read 1 (e.g. http://i.imgur.com/Cv4HXJq.png) and the %T running low in read 2 (e.g. http://i.imgur.com/ul7v55H.png). Read 2 tends to be a warning rather than a fail.

Per base quality is good across the board, per sequence GC content looks fine and the only other fails I get are duplicated sequences and kmer content (both of which I gather are common fails in mRNA seq data).

I've little experience with RNA seq QC - is this something to be concerned about?

Thanks for any advice.

Best,

Iain

RNA-Seq fastqc • 3.2k views

ADD COMMENT • link updated 7.9 years ago by harold.smith.tarheel ★ 5.0k • written 7.9 years ago by Duff ▴ 670

score 2 · Accepted Answer · 2017-06-13

2

Entering edit mode

7.9 years ago

harold.smith.tarheel ★ 5.0k

This is the expected result if 1) your libraries are strand-specific and 2) they contain some poly(A)+ contamination. You should also observe oligo-T among the over-represented k-mer content.

ADD COMMENT • link 7.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

Ah - strand specific. So obvious now you've pointed that out. Thanks. Can't see any poly A or oligo T in the kmers.

ADD REPLY • link 7.9 years ago by Duff ▴ 670

0

Entering edit mode

If this answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. This helps future readers of your question to evaluate the response.

ADD REPLY • link 7.9 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

I was looking at a fungal genome once that seemed to consist of only 3 letters on each strand. IIRC, one strand was only A, C, and G and the other was only C, G, and T (at least, in the areas I looked at). So if as Harold suggests your data is strand-specific, that particular organism might show this kind of pattern, if one of the two strands was generally coding and the other wasn't.

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k