Hi everyone
I understand the RPKM fomula is as follows:
C = Number of reads mapped to a gene
N = Total mapped reads in the experiment
L = exon length in base-pairs for a gene
Equation = RPKM = (10^9 * C)/(N * L)
I have the counts ( from HTSeq) and transcript length (retrieved from Ensembl API) for each gene.
My question is, for the total mapped reads(N), should I be counting the reads only in the exons for all genes? If thats right, can I just add all the gene counts from HTSeq output to get the total mapped reads.
OR
the total mapped reads will be ALL the mapped reads in the BAM file?
Can somebody confirm that this is the same equation as http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/? Why is 10^9 different that 1e6? CORRECTION: nevermind, I didn't see the 1e3 incorporated in one of the other eqns