I'm working with single-end Illumina RNA-seq data. After producing BAM files using tophat against hg19, I'm running cufflinks (against hg19 knownGene) and subsequently htseq-count (in order to generate count data for use in DESeq2).
The BAM files are aligning between 35-40M reads per sample (>90% of total reads in each case), and the BAMs look good in terms of alignment to the reference.
However, I am seeing htseq counts in the region of 300,000 to 600,000 reads (representing ~7000 transcripts), far below the total number of reads, and certainly what appear to be visually acceptable when viewing the BAM against the hg19 reference with gene annotations.
Cufflinks is producing FPKM values for ~21,000 transcripts which, in contrast with the htseq-count output, makes me think that htseq-count is missing something, or I am missing something and I have not configured it correctly.
Why are my htseq counts so low?
Any help appreciated.
An example feature read-out table:
Would be helpful to add the commands you used for mapping and htseq count.
I'm using usegalaxy.org right now, so operating completely within the browser:
tophat:
htseq-count:
Is your RNA-seq stranded?
rna-seq is stranded, yes
I'll add that when looking at the BAM against hg19, it's clear that a majority of reads are exonic
also, setting strandedness to 'no' results in zero counts
Am I missing the nuance between using the cached hg19 reference for tophat, and the hg19 knownGene refGene gtf file that I used for htseq (which I ported-over from UCSC table browser)? Are these two references incompatible?
Based on the previous comment, I rechecked my tophat pipeline; you were correct that the data is unstranded and, hence, my initial setting for htseq (Stranded = Yes) was incorrect. Thanks for pointing me in the right direction.
I also did some digging and found this previous post, that clarified the issue for me a bit more.
Well that already explains the low counts since half of the reads were ignored given those were in the opposite direction of the annotation. Good luck with the rest of your project.