Htseq Count - Gene And Exon Problem
0
0
Entering edit mode
11.6 years ago

Hello everyone,

I ran HTseq-count to cound reads for each GENE

python -m HTSeq.scripts.count -s no input.sam mm9.gtf > output_1

Then I ran it to count reads for each EXON

python -m HTSeq.scripts.count -s no -i exon_id input.sam mm9.gtf > output_2

But if I add up the read count for all exons for a particular gene from output_2 , it is not equal to the read count of that gene in output_1.

Example: gene Vip
output_1 : 85

output_2 : 41 (adding its all exons from output_2 )

Please can anyone tell me why this is happening , have I done something wrong.

HTSeq version : HTSeq-0.5.3p9

Thanks,

Sid

gene • 5.5k views
ADD COMMENT
0
Entering edit mode

since you have a eukaryotic genome, are you sure that none of the reads are mapped to intronic positions of gene Vip ? to be sure about that may be you can cross check the results visually by using IGV (Integrative Genomics Viewer) or some other visualization tool.

ADD REPLY
0
Entering edit mode

Looking at this picture http://www-huber.embl.de/users/anders/HTSeq/doc/count.html and considering that the default mode is union I believe that Sudeep is right...reads overlapping two exons and the retained intron should be assigned to the gene in your first run but in the second case should be regarded as (I think) ambiguous. If you want to be sure re-run the script with a reduced gtf with a single gene information and compare the numbers with this in mind.

ADD REPLY

Login before adding your answer.

Traffic: 2117 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6