Hi guys,
I would like to calculate the RPKM of my RNA seq experiment. To do this, as from the formula, I need to know the gene length. My starting point are the row reads (single end) counts resulting from:
htseq-count -t exon -i gene_id my_file.sam gencode_v19.gtf > my_count.txt
To calculate the gene length I used the .gtf file from http://www.gencodegenes.org/releases/19.html.
The .gtf file contains start and end positions of genes annotated as: exons, transcript, CDS, start codon etc. I think I have to consider only exons for each gene in order to calculate the length of a gene but I'm not sure. My final purpose is to calculate the gene-level expression of my samples. I'm not interested in splicing variants for example.
Is it right to take into account exons to calculate the gen length for my purpose?
Thank you in advance and kind regards
B
If you use featureCounts from subread (http://subread.sourceforge.net/) instead of htseq for your quantification, you will get the gene length automatically. Just a suggestion.
The length in featureCounts is wrong. Because it includes the length of intron!