How stranded is stranded RNA-Seq (TruSeq) protocol?
2
4
Entering edit mode
9.7 years ago
Leszek 4.2k

I have a question regarding stranded RNA-Seq from Illumina TruSeq. I'm comparing some stranded RNA-Seq runs from SOLID and Illumina. I have found that while SOLID stranded protocol is perfect (literally no reads aligning to antisense strand), I found quite a lot reads in antisense strand in Illumina sample. Still sense strand reads are enriched (~20 times more reads in sense strand).

The reads from antisense strand are aligned uniquely (MAPQ>100, mapped with STAR), but I don't think it's real expression from antisense strand, as the exon-intron structure of antisense reads is similar to exon-intro structure from sense. But I see huge enrichment 10-20x) in Illumina

Is TruSeq designed to enrich for sense strand or is it suppose to be perfectly stranded protocol as SOLID was?

Is it possible that something went wrong during library prep?

< image not found >

truseq RNA-Seq • 6.2k views
ADD COMMENT
0
Entering edit mode

Just curious, is this paired-end stranded data? The latest jbrowse has code for dealing with that but I wasn't sure from your description

ADD REPLY
0
Entering edit mode

No, it's stranded single-end

ADD REPLY
2
Entering edit mode
9.6 years ago
Leszek 4.2k

I have found highly variable sense-strand read enrichment (6.39-27.05; median: 9.85) in our RNA-Seq stranded data (6 time points with 3 RNA fractions in colours). I have ignored exons having any overlap with known antisense exons.

Overall, red RNA fraction tends to have higher sense-strand read enrichment that blue/orange. Any clue why there is such high variability in enrichment between time points / RNA fractions?

I have obtained slightly lower enrichment for genes (5.98-24.07; median: 8.74). As these data is from zebrafish, for which the annotation is far from being perfect, I suspect many antisense genes / exons are simply not annotated.

More about methodology here.

< image not found >

ADD COMMENT
1
Entering edit mode
9.7 years ago
Michele Busby ★ 2.2k

I suspect there is something wrong in your SOLID data if it is PERFECTLY stranded. Things are rarely 100% perfect unless there is a mistake in the analysis somewhere and you should see some legit biological antisense transcription.

Our experience with TruSeq is that it is very, very good. Like >90% where we expect it to be. But I still haven't seen a perfect protocol yet.

ADD COMMENT
0
Entering edit mode

I guess the SOLID data I've received was somehow preprocessed, as I see nearly no antisense reads...

Anyway, in your TruSeq do you see 20x enrichment in sense strand, or more?

ADD REPLY
1
Entering edit mode

In our experience, we've seen 20x or more enrichment on the sense strand, using both Scriptseq and SMARTer protocols. We only noticed lower rates of 'strandedness' when the (clinical) samples were old or degraded.

ADD REPLY
1
Entering edit mode

Thanks Israel, that's really useful. So what I see (~20x) isn't that bad, right?

ADD REPLY
1
Entering edit mode

Yes, assuming that what we see can be considered as "normal," then the same should apply to you. BTW, biases between library preparation methods are described in several articles, e.g. 1 2 and so on, and that's why we use infer_protocol.py (RSeQC) routinely during the quality control checkups. Hope this helps.

ADD REPLY
1
Entering edit mode

Thanks a lot for the links. I'll definitely have a look at these!

ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6