Question

Htseq Count - Gene And Exon Problem

0

Entering edit mode

11.6 years ago

siddharth.sethi5 ▴ 290

Hello everyone,

I ran HTseq-count to cound reads for each GENE

python -m HTSeq.scripts.count -s no input.sam mm9.gtf > output_1

Then I ran it to count reads for each EXON

python -m HTSeq.scripts.count -s no -i exon_id input.sam mm9.gtf > output_2

But if I add up the read count for all exons for a particular gene from output_2 , it is not equal to the read count of that gene in output_1.

Example: gene Vip
output_1 : 85

output_2 : 41 (adding its all exons from output_2 )

Please can anyone tell me why this is happening , have I done something wrong.

HTSeq version : HTSeq-0.5.3p9

Thanks,

Sid

gene • 5.5k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 11.6 years ago by siddharth.sethi5 ▴ 290

0

Entering edit mode

since you have a eukaryotic genome, are you sure that none of the reads are mapped to intronic positions of gene Vip ? to be sure about that may be you can cross check the results visually by using IGV (Integrative Genomics Viewer) or some other visualization tool.

ADD REPLY • link 11.6 years ago by Sudeep ★ 1.7k

0

Entering edit mode

Looking at this picture http://www-huber.embl.de/users/anders/HTSeq/doc/count.html and considering that the default mode is union I believe that Sudeep is right...reads overlapping two exons and the retained intron should be assigned to the gene in your first run but in the second case should be regarded as (I think) ambiguous. If you want to be sure re-run the script with a reduced gtf with a single gene information and compare the numbers with this in mind.

ADD REPLY • link 11.6 years ago by vodka ▴ 80