Infer-experiment.py, is strand-specific?
1
1
Entering edit mode
9.9 years ago
int11ap1 ▴ 490

I have SE reads, which I want to know if they were sequenced with a strand-specific protocol or not. For that, I use Infer-experiment.py from the RSeQC package.

The output is something like this:

Fraction of reads explained by "++,--": 0.0158
Fraction of reads explained by "+-,-+": 0.9842
Fraction of reads explained by other combinations: 0.0000

What does it mean "++,--", "+-,-+", and other combinations?

Thanks.

alignment • 12k views
ADD COMMENT
2
Entering edit mode
9.9 years ago

This is explained in the documentation.

ADD COMMENT
0
Entering edit mode

So my dataset is strand-specific (fr-secondstrand), right?

ADD REPLY
0
Entering edit mode

Correct

ADD REPLY
2
Entering edit mode

Why is it fr-secondstrand and not fr-firststrand?

ADD REPLY
1
Entering edit mode

Oops, it should be fr-firststrand, good catch. I must have been thinking in htseq-count/featureCounts nomenclature and not tophat2 nomenclature!

ADD REPLY
0
Entering edit mode

Can you specify how you understood from the output below that it is fr-firststrand? In the documentation that just write that read mapped to '+' strand indicates parental gene on '-' strand. And fr-firststrand is defined as "the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced". I barely can unite those both definitions to conclude that the library is fr-firststrand.

ADD REPLY
1
Entering edit mode

fr-firststrand == dUTP-based library, so if SE reads on the + strand are mostly on genes on the - strand then that's what you want, since read #2 in a pair matches the strand in dUTP-based libraries.

ADD REPLY
0
Entering edit mode

if we have SE reads, why is there a pair and #2 read? Also, do you mean that the first sequenced read corresponds to the dUPT strand (i.e. first sequenced read is the exact copy of dUPT strand)?

ADD REPLY
0
Entering edit mode

It's easiest to just think in terms of paired end sequencing. For SE reads, you just have read #1 then. By dUTP, I mean a library made with the dUTP method. You can trace through how that works to figure out which strand is which (use a white board with multiple colors, that usually helps).

ADD REPLY
0
Entering edit mode

Second strand (not the original RNA) is preserved in dUTP methods. #1 read in a pair is complementary to the second strand (it represents the original RNA). #2 read in a pair is complementary to the original strand (it represents the second strand). Is it correct?

ADD REPLY
0
Entering edit mode

Well, the first synthesized strand, which is complementary to the ssRNA contains no dUTP. For the second strand synthesis, dTTP is replaced with dUTP, so UNG treatment can digest things. At least that's how it used to work (I don't do the library prep these days, so the exact steps may be completely different now). At the end of the days, read #1 is complementary and read #2 is on the same strand as the original gene/feature/whatever.

I think this is an image from one of the original single-strand RNAseq papers describing the method:

< image not found >

RNA is red, btw.

ADD REPLY
0
Entering edit mode

To which strand is read #1 complementary to? to original mRNA?

ADD REPLY
0
Entering edit mode

Yes, mRNA/gene coding strand/sense strand. So if the gene is on the + strand, read #1 will be reverse complemented.

ADD REPLY
0
Entering edit mode

Ok, just one last thing because it seems I am missing something. Let say gene is on Crick strand. mRNA is produced, then there is a process with dUTP etc. At the end the adapters are ligated to the second strand 3'-5' (complementary strand to original mRNA). This strand is then sequenced from 5' end, meaning that the read that is produced (the first read) is complementary to the second strand and therefore it is on the same strand as the original mRNA. So, I guess there is somewhere a mistake in my understanding because at the end #1 read is not complementary to mRNA.

ADD REPLY
0
Entering edit mode

There's lots of flipping that goes on once things are actually ligated onto the flow cell.

ADD REPLY
0
Entering edit mode

Figure 1 in this paper gives a good image of dUTP RNA-seq library protocol. Also this Chipster guideline gives a good background and also points out that aligners like hisat2 and bowtie really dropped the ball when they wrote the parameters "--fr/--rf/--ff" as this has nothing to do with first/second strand but the read orientation and is confusing when you read the strandness parameters "--rf/--fr" from StringTie and "--library-type" from TopHat.

ADD REPLY
0
Entering edit mode

However, I do not understand "Fraction of reads explained by other combinations". That webpage does not explain that...

ADD REPLY
1
Entering edit mode

It looks like that's the same as "Fraction of reads failed to determine", where the exact wording is different between versions. One would have to check the source code to be sure. Having said that, since you had no alignments fall into that group it doesn't really matter.

ADD REPLY

Login before adding your answer.

Traffic: 1509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6