Full Disclosure: I am really new to RNA sequencing.
I am using bowtie, tophat, and htseq to build a counts matrix of reads for my samples. I am using the "chromosomes" file to build my reference genome from CGD. Everything seems to be going well.
My understanding is that there are 6620 total features for haploids. My data set is diploid, which should give me 13,280 total features. However, when I look at my resulting counts matrix, I have approx 12,800 rows. Shouldn't I expect 13,280 rows because each row corresponds to a feature?
Numbers are from: http://www.candidagenome.org/cache/C_albicans_SC5314_genomeSnapshot.html
What do you mean by feature? RNA sequencing typically uses genes or transcripts to quantify against. Diploid means two copies of the chromosome, each will contain mostly identical features (an allele of a gene is still the same gene). The fact that the table that you link to doubles the features confuses me.