Does better results in HTSeq-Count mean that the script was run correctly?
2
1
Entering edit mode
8.5 years ago

I tested two different options while running HTSeq-Count, -s no and -s reverse. This are the results:

For -s no:

__no_feature 435592

__ambiguous 953159

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

For -s reverse:

__no_feature 573728

__ambiguous 410510

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

For the option -s reverse there are lower ambiguous values but higher no_feature than for -s no. As far as I know this option depends on the construction of the library, but when they gave me this sequences they didn't mention it. All I know is that it's was constructed under a Illumina protocol and that it was a Paired-End experiment of RNA-Seq from peach (Prunus persica). I'm inclined to think that less ambiguous values are just better, even than with more no_feature values.

So, which one it's right?

===============

Edit: This are the results from -s yes:

__no_feature 41467373

__ambiguous 506

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

RNA-Seq • 3.7k views
ADD COMMENT
0
Entering edit mode

Did you map the reads to the genome or transcriptome?

ADD REPLY
0
Entering edit mode

it was mapped to the genome using tophat2

ADD REPLY
1
Entering edit mode

So you should use -s yes. The differences you show are expected even with random reads

ADD REPLY
1
Entering edit mode

That's even not something he tested and will for sure depend on the protocol. You can have stranded or unstranded RNA seq.

ADD REPLY
0
Entering edit mode

This are the results from -s yes:

__no_feature 41467373

__ambiguous 506

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

ADD REPLY
0
Entering edit mode

It looks like it's stranded but as written below, you have to make sure which protocol was used.

ADD REPLY
1
Entering edit mode

A huge number now ends up in no_feature which argues against -s yes. My guess is non-stranded, in which it makes sense that there are more __ambiguous reads (which can not be assigned since htseq-count has no strand information available). ambiguous means that the read can be from either geneA or geneB, without strand information impossible to say in the case of antisense transcripts.

ADD REPLY
0
Entering edit mode

I missed a digit :) In that case it's probably reverse since most of the reads do map in reverse mode

ADD REPLY
6
Entering edit mode
8.5 years ago

This is a pretty common method to determine the strandedness of a library and is essentially what RSeQC is doing. In this case, -s reverse is the correct setting (it's also the majority of what's produced these days). The general reasoning is:

  1. If it's an unstranded library, then each of the stranded methods will have ~2x more _no_feature counts than the -s no setting.
  2. If it's a stranded library, one of the stranded setting will have "slightly" higher _no_feature counts (because reality is annoying like that) and the other will have vastly higher _no_feature counts.
ADD COMMENT
0
Entering edit mode

replicate1 __no_feature 39795601 __ambiguous 73269 __too_low_aQual 2553578 __not_aligned 822830 __alignment_not_unique 6895933 Is this result normal? I used HIAST to do alignment. Because my sequencing data is Sanger/Illumina1.9, so I only set the quality value as Phred 33 and run other parameters as default. My alignment rate is about 90%. However, the HTseq result is quite unexpected. I have no clue where went wrong.

ADD REPLY
0
Entering edit mode

It is a stranded sequencing data, when I run HTseq, I didn't set -s argument since the default setting is yes.

ADD REPLY
0
Entering edit mode
8.5 years ago

This seems a dangerous assumption and in my opinion not valid. Try to find out which kit was used exactly, unless conclusions could be very very wrong.

ADD COMMENT
2
Entering edit mode

I would use RSeQC's infer_experiment.py in order to get an estimate of the library's strandedness. In case that you have no information about the kit.

ADD REPLY
0
Entering edit mode

I tried that but I dont know where to find a gene model for prunus persica

ADD REPLY

Login before adding your answer.

Traffic: 1806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6