Hi all,
I'm confused on how should I call htseq-count for my dataset concerning option --stranded [yes or no].
I'm analysing paired-end RNASeq data, at the alignment step with tophat I produced one single bam file from the two reads (R1, R2). So I have one bam file with both strands, this file is sorted by name with samtools and when calling htseq-count I firstly assumed I had to say stranded=yes, I got my counts and there were many zeros, so I run again with stranded=no and got many more read counts, less genes with zeros and genes generally with more counts. These are the numbers for both performances:
stranded=yes no feature = 49 258 937 ambiguous = 8 393
stranded=no no feature = 4 654 795 ambiguous = 1781 755
Since the number of "no feature" is considerably lower in stranded=no, should I use this tag? Is this a proof that this is right call?
So I don't know if my logic is right, I had strand specific data but since I merged them (R1 and R2) in the alignment step, I have everything together in the bam and I have to tell this to htseq-count by setting the -stranded=yes even if this is not completely right, any sense??
As another clue I run infer_experiment.py with the merged-sorted bam file from the RSeQC package and the output was:
This is PairEnd Data Fraction of reads failed to determine: 0.0012 Fraction of reads explained by "1++,1--,2+-,2-+": 0.0827 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9161
This tells me I have strand-specific again ... so maybe I should come back to htseq-count --stranded=yes??
Any help is very welcome!! Thanks a lot!
If your data is strand specific you should tell it yes for strandness, well, htseq-count default already is for strand=yes. So you dont need to specify. The issue with your too many no_feature reads maybe is related with your alignment or annotation file. Maybe is a good idea try to quantify using RSEM or Kallisto.
Thanks a lot for the suggestion, I have run Kallisto and the numbers are similar to --stranded=no.
Try out Salmon, it'll infer your library prep from the raw reads in a handy text file.