Question

Does better results in HTSeq-Count mean that the script was run correctly?

1

Entering edit mode

8.6 years ago

cr.balladares.m ▴ 10

I tested two different options while running HTSeq-Count, -s no and -s reverse. This are the results:

For -s no:

__no_feature 435592

__ambiguous 953159

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

For -s reverse:

__no_feature 573728

__ambiguous 410510

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

For the option -s reverse there are lower ambiguous values but higher no_feature than for -s no. As far as I know this option depends on the construction of the library, but when they gave me this sequences they didn't mention it. All I know is that it's was constructed under a Illumina protocol and that it was a Paired-End experiment of RNA-Seq from peach (Prunus persica). I'm inclined to think that less ambiguous values are just better, even than with more no_feature values.

So, which one it's right?

===============

Edit: This are the results from -s yes:

__no_feature 41467373

__ambiguous 506

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

RNA-Seq • 3.7k views

ADD COMMENT • link updated 8.6 years ago by Devon Ryan 105k • written 8.6 years ago by cr.balladares.m ▴ 10

0

Entering edit mode

Did you map the reads to the genome or transcriptome?

ADD REPLY • link 8.6 years ago by Asaf 10k

0

Entering edit mode

it was mapped to the genome using tophat2

ADD REPLY • link 8.6 years ago by cr.balladares.m ▴ 10

1

Entering edit mode

So you should use -s yes. The differences you show are expected even with random reads

ADD REPLY • link 8.6 years ago by Asaf 10k

1

Entering edit mode

That's even not something he tested and will for sure depend on the protocol. You can have stranded or unstranded RNA seq.

ADD REPLY • link 8.6 years ago by WouterDeCoster 47k

0

Entering edit mode

This are the results from -s yes:

__no_feature 41467373

__ambiguous 506

__too_low_aQual 0

__not_aligned 0

__alignment_not_unique 8164048

ADD REPLY • link 8.6 years ago by cr.balladares.m ▴ 10

0

Entering edit mode

It looks like it's stranded but as written below, you have to make sure which protocol was used.

ADD REPLY • link 8.6 years ago by Asaf 10k

1

Entering edit mode

A huge number now ends up in no_feature which argues against -s yes. My guess is non-stranded, in which it makes sense that there are more __ambiguous reads (which can not be assigned since htseq-count has no strand information available). ambiguous means that the read can be from either geneA or geneB, without strand information impossible to say in the case of antisense transcripts.

ADD REPLY • link 8.6 years ago by WouterDeCoster 47k

0

Entering edit mode

I missed a digit :) In that case it's probably reverse since most of the reads do map in reverse mode

ADD REPLY • link 8.6 years ago by Asaf 10k

0

Entering edit mode

8.6 years ago

WouterDeCoster 47k

This seems a dangerous assumption and in my opinion not valid. Try to find out which kit was used exactly, unless conclusions could be very very wrong.

ADD COMMENT • link 8.6 years ago by WouterDeCoster 47k

2

Entering edit mode

I would use RSeQC's infer_experiment.py in order to get an estimate of the library's strandedness. In case that you have no information about the kit.

ADD REPLY • link 8.6 years ago by michael.ante ★ 3.9k

0

Entering edit mode

I tried that but I dont know where to find a gene model for prunus persica

ADD REPLY • link 8.6 years ago by cr.balladares.m ▴ 10

score 6 · Accepted Answer · 2016-05-18

6

Entering edit mode

8.6 years ago

Devon Ryan 105k

This is a pretty common method to determine the strandedness of a library and is essentially what RSeQC is doing. In this case, -s reverse is the correct setting (it's also the majority of what's produced these days). The general reasoning is:

If it's an unstranded library, then each of the stranded methods will have ~2x more _no_feature counts than the -s no setting.
If it's a stranded library, one of the stranded setting will have "slightly" higher _no_feature counts (because reality is annoying like that) and the other will have vastly higher _no_feature counts.

ADD COMMENT • link 8.6 years ago by Devon Ryan 105k

0

Entering edit mode

replicate1 __no_feature 39795601 __ambiguous 73269 __too_low_aQual 2553578 __not_aligned 822830 __alignment_not_unique 6895933 Is this result normal? I used HIAST to do alignment. Because my sequencing data is Sanger/Illumina1.9, so I only set the quality value as Phred 33 and run other parameters as default. My alignment rate is about 90%. However, the HTseq result is quite unexpected. I have no clue where went wrong.

ADD REPLY • link 7.3 years ago by sophialovechan ▴ 80

0

Entering edit mode

It is a stranded sequencing data, when I run HTseq, I didn't set -s argument since the default setting is yes.

ADD REPLY • link 7.3 years ago by sophialovechan ▴ 80