Hi all,
Can someone help me understand the RSeQC Output from infer_experiment.py?
So this is the output:
This is PairEnd Data
Fraction of reads failed to determine: 0.0560
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0192
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9247
So it's stranded but is it fr-firststrand or fr-secondstrand? I do not understand the help given here:
For pair-end RNA-seq, there are two different ways to strand reads (such as Illumina ScriptSeq protocol):
1++,1–,2+-,2-+
read1 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
read1 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
read2 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
read2 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
1+-,1-+,2++,2–
read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
Thanks for your help!
Thanks! I would like to use HTSeq-count but also Stringtie, which takes either
So it's fr-firststrand, correct?
Could you quickly explain what "1+-,1-+,2++,2--" means?
Thanks!
Yes, you want
--rf
, which matches dUTP based methods (the documentation for those options is horrendous).1+- means that read 1 mapped to the + strand when the gene itself was on the - strand. For dUTP-based methods, read to aligns in the same direction as the transcript from which the sequenced fragment arose ("read 2 sets the strand"), so you expect "2++" and "2--" to get more signal than "2+-" and "2-+".
Hi Devon,
So if 1++,1--,2+-,2-+" get more signal (Value) then its fr-secondstrand
and if 1+-,1-+,2++,2-- get more signal (Value) then its fr-firststrand
Am I understand right?
Yup. You don't see much "fr-secondstrand" data these days.
Thanks - now I get it! One more question concerning HISAT2 and Stringtie. Do I have to set this tag when using HISAT2 in order for Stringtie (and other tools for that matter!) to work properly with strand-specific data?
The mapping should be the same apart from the XS tag, shouldn't it? Is this tag important for all the downstream tools? Thank you very much for your help!
It ends up not mattering much for alignments unless you're supplying a GTF file. For stringTie, it's quite useful, since then the resulting transcripts can have a meaningful strand assigned to them (makes doing things like finding ORFs and performing annotations a bit simpler).