Dear All,
I am interested in calculating the % of reads associated to globin gene and rRNA genes. Right now, I am not sure whether my paired end RNAseq data has followed strand specific protocol or not. I requested the incharge person to inform me.
Meanwhile, I selected all the three options for strandedness (no,yes,reverse) in htseq-count. How do I get the strand (sense,antisense) information? How to interpret the Stranded:Reverse
counts?
Globin genes Stranded:No Stranded:Yes Stranded:Reverse
HBB 40204 40197 7
HBA1 38811 38795 16
HBA2 129847 129770 77
HBG1 1566 1566 0
HBG2 2750 2750 0
HBD 3 3 0
HBE1 1 0 1
HBZ 0 0 0
HBQ1 9 3 6
MB 4 0 4
CYGB 294 2 354
NGB 289 2 319
How to interpret the difference among these three options?
Stats from special counters
Special counters Stranded:No Stranded:Yes Stranded:Reverse
__no_feature 56289350 94180089 56914563
__ambiguous 625347 18161 343824
__too_low_aQual 0 0 0
__not_aligned 0 0 0
__alignment_not_unique 30631662 30631662 30631662
I always like to see how things look like on IGV. I recommend loading the tracks and check as Noolean proposed but additionally, I would take a couple of small transcripts (with few reads) and check if the counts on IGV match with those of HTSeq.
Also, as Antonio said too, the person that generated the library has to give this information.