I tested two different options while running HTSeq-Count, -s no
and -s reverse
. This are the results:
For -s no
:
__no_feature 435592
__ambiguous 953159
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 8164048
For -s reverse
:
__no_feature 573728
__ambiguous 410510
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 8164048
For the option -s reverse
there are lower ambiguous values but higher no_feature than for -s no
.
As far as I know this option depends on the construction of the library, but when they gave me this sequences they didn't mention it. All I know is that it's was constructed under a Illumina protocol and that it was a Paired-End experiment of RNA-Seq from peach (Prunus persica).
I'm inclined to think that less ambiguous values are just better, even than with more no_feature values.
So, which one it's right?
===============
Edit:
This are the results from -s yes
:
__no_feature 41467373
__ambiguous 506
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 8164048
Did you map the reads to the genome or transcriptome?
it was mapped to the genome using tophat2
So you should use -s yes. The differences you show are expected even with random reads
That's even not something he tested and will for sure depend on the protocol. You can have stranded or unstranded RNA seq.
This are the results from
-s yes
:It looks like it's stranded but as written below, you have to make sure which protocol was used.
A huge number now ends up in no_feature which argues against -s yes. My guess is non-stranded, in which it makes sense that there are more __ambiguous reads (which can not be assigned since htseq-count has no strand information available). ambiguous means that the read can be from either geneA or geneB, without strand information impossible to say in the case of antisense transcripts.
I missed a digit :) In that case it's probably reverse since most of the reads do map in reverse mode