Entering edit mode
3.7 years ago
blur
▴
280
I aligned and created bam files for an experiment, and when I open a gene I see reads. But, then I ran HTSEq counts on the same file and the counts = 0
The cmd was:
htseq-count -n 6 file.bam mouse.gtf > htseq_counts_results
Why am I getting zero? Any help would be appreciated
EDIT - more info: The reference file looks like this (with chr)
chr1 havana gene 3143476 3144545 . + . gene_id "ENSMUSG00000102693"; gene_version "2"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC";
chr1 havana transcript 3143476 3144545 . + . gene_id "ENSMUSG00000102693"; gene_version "2"; transcript_id "ENSMUST00000193812"; transcript_version "2"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"; transcript_name "4933401J01Rik-201"; transcript_source "havana"; transcript_biotype "TEC"; tag "basic"; transcript_support_level "NA (assigned to previous version 1)";
Alignment files look like this:
NB 16 chr12 31318525 60 110M33S * 0 0 AGACCGTGGACTCTGTAGAGAAGAAAGTCAATGAGATAAAAGACATCCTGGCCCAGAGCCCAGCAGCGGAACCACTGAAAAACATTGGCATTCTCTTCGAGGAGGCAGAGAAACTAACCAAAGATGTCACAGAAAAGATGACA AAEA<<A<<EAEEE<EAAAAAEEEE<AE<EAEAEAEEEE/EE/EEEEEEEA/EEAEEAEEEEEEEEE<AEEE/EEEEAEEEAEEEEAE</EEEEEEEEEEEEE<EEEEEEEEEEEEEAEEEEEEEEEEEEAEEEEEAEEEEA/ NM:i:0 MD:Z:110 AS:i:110 XS:i:19 SA:Z:chr12,31319919,-,109S31M3S,60,0;
Check that the chromosome and gene names are exactly the same in the bam and gtf (e.g. that they both are "chr1" and not one "chr1" and one "1", or for spaces in gene name or if one version of gene names has the isoform ".1" at the end and one not). Of course, posting a few lines of the bam and of the gtf would also increase the probability that we can help you!
The other possibility are mapping qualities of 0 (multimappers), these two options should be checked first.
The chromosome names look the same in the two files. The read you show as 60 as mapping quality which should be fine. Which aligner did you use? I have to admit that I usually use htseq-count after mapping with STAR which has a max quality of 255. However I would suggest you follow ATpoint suggestion and obtain a summary of the mapping qualities. This might be of the problem. Another suggestion is to manually check and see if you have in the BAM file reads that obviously fall within the genes declared in the gtf. For example, you could see if in the gtf, in chr 12 at position 31318525 you have a gene or not. In my opinion, the main suspects are still the quality of mapping and some difference between the BAM file and the gtf file.