Entering edit mode
4 months ago
Srujana
▴
10
Hello,
I have performed htseq on my BAM files, where I got very high number of no_feature. I have tried using name sorted BAM file and position sorted BAM file, but the result in both the cases is same. The representation of chromosome matches in both BAM file and GTF file. Can someone explain what could be the reason for this.
The code that I have used is
htseq-count \
-f bam \
-r pos \
-m union \
-s reverse \
-t exon \
-i gene_id \
/path/to/sorted.bam \
/path/to/gencode.v14.annotation.gtf > /path/to/htseq_output.txt
my output:
no_feature 18678640
ambiguous 728599
too_low_aQual 8350383
not_aligned 8101491
alignment_not_unique 1812645
What is the reason you consider this a high number of no_feature assignments and why do you expect a smaller number?
I was actually replicating the already-present result. I am following the exact same procedure, but the previous result showed less 'no_feature' counts. This made me wonder if there is some problem with my code or my files. I used the exact same BAM files that were used earlier.
This was previous result
no_feature 8547931
ambiguous 1327162
too_low_aQual 8350383
not_aligned 8101491
alignment_not_unique 4840073
Are you sure the GTF file is the exact same? 'No features' implies the bam file alignments do not overlap anything in the annotation file. Also if your htseq-count parameters are different, for example
-t exon
when they used-t gene
or-t transcript
could cause them to align to 'no features'. Also perhaps specifying stranded might cause this, but I'm not sure if they would fall into a different classification.Thanks for the reply. I have changed the gtf file, now i am getting proper result