HTSeq-count paired-end --stranded=yes gives less counts than --stranded=no
2
0
Entering edit mode
8.6 years ago
pablominguez ▴ 10

Hi all,

I'm confused on how should I call htseq-count for my dataset concerning option --stranded [yes or no].

I'm analysing paired-end RNASeq data, at the alignment step with tophat I produced one single bam file from the two reads (R1, R2). So I have one bam file with both strands, this file is sorted by name with samtools and when calling htseq-count I firstly assumed I had to say stranded=yes, I got my counts and there were many zeros, so I run again with stranded=no and got many more read counts, less genes with zeros and genes generally with more counts. These are the numbers for both performances:

stranded=yes no feature = 49 258 937 ambiguous = 8 393

stranded=no no feature = 4 654 795 ambiguous = 1781 755

Since the number of "no feature" is considerably lower in stranded=no, should I use this tag? Is this a proof that this is right call?

So I don't know if my logic is right, I had strand specific data but since I merged them (R1 and R2) in the alignment step, I have everything together in the bam and I have to tell this to htseq-count by setting the -stranded=yes even if this is not completely right, any sense??

As another clue I run infer_experiment.py with the merged-sorted bam file from the RSeQC package and the output was:

This is PairEnd Data Fraction of reads failed to determine: 0.0012 Fraction of reads explained by "1++,1--,2+-,2-+": 0.0827 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9161

This tells me I have strand-specific again ... so maybe I should come back to htseq-count --stranded=yes??

Any help is very welcome!! Thanks a lot!

RNA-Seq • 6.3k views
ADD COMMENT
0
Entering edit mode

If your data is strand specific you should tell it yes for strandness, well, htseq-count default already is for strand=yes. So you dont need to specify. The issue with your too many no_feature reads maybe is related with your alignment or annotation file. Maybe is a good idea try to quantify using RSEM or Kallisto.

ADD REPLY
0
Entering edit mode

Thanks a lot for the suggestion, I have run Kallisto and the numbers are similar to --stranded=no.

ADD REPLY
1
Entering edit mode

Try out Salmon, it'll infer your library prep from the raw reads in a handy text file.

ADD REPLY
3
Entering edit mode
8.6 years ago

You want -s reverse in htseq-count (or -s 2 in featureCounts). That's also what the output from infer_experiment.py is indicating.

ADD COMMENT
0
Entering edit mode
8.6 years ago
wud • 0

I have met the same question. both the sequence report and the RSeQC package indicate that I had strand specific data. but when i set the htseq-count --stranded=yes or featurecount -s 1 (means strand speciafic), the number of reads mapping to the ref is lower than setting the parameter with no stranded special.

ADD COMMENT

Login before adding your answer.

Traffic: 3112 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6