Entering edit mode
11.6 years ago
siddharth.sethi5
▴
290
Hello everyone,
I ran HTseq-count to cound reads for each GENE
python -m HTSeq.scripts.count -s no input.sam mm9.gtf > output_1
Then I ran it to count reads for each EXON
python -m HTSeq.scripts.count -s no -i exon_id input.sam mm9.gtf > output_2
But if I add up the read count for all exons for a particular gene from output_2 , it is not equal to the read count of that gene in output_1.
Example: gene Vip
output_1 : 85
output_2 : 41 (adding its all exons from output_2 )
Please can anyone tell me why this is happening , have I done something wrong.
HTSeq version : HTSeq-0.5.3p9
Thanks,
Sid
since you have a eukaryotic genome, are you sure that none of the reads are mapped to intronic positions of gene Vip ? to be sure about that may be you can cross check the results visually by using IGV (Integrative Genomics Viewer) or some other visualization tool.
Looking at this picture http://www-huber.embl.de/users/anders/HTSeq/doc/count.html and considering that the default mode is union I believe that Sudeep is right...reads overlapping two exons and the retained intron should be assigned to the gene in your first run but in the second case should be regarded as (I think) ambiguous. If you want to be sure re-run the script with a reduced gtf with a single gene information and compare the numbers with this in mind.