Hi I want to ask a question about calculate the RPKM.
I see the previous people calculate the RPKM for a gene, by using all the reads included in intron and exons belong to a gene.
If I just want to get the RPKM for different isoforms. Can I just pick up the reads mapped to exons.
For example, Gene A has 4 exons:
Chr2 MSU_osa1r6 gene 4542759 4544980 . + . ID=13102.t00754;Name=unknown gene
Chr2 MSU_osa1r6 CDS 4543031 4543177 . + 0 Parent=13102.m00974
Chr2 MSU_osa1r6 CDS 4543287 4543709 . + 0 Parent=13102.m00974
Chr2 MSU_osa1r6 CDS 4543836 4543952 . + 0 Parent=13102.m00974
Chr2 MSU_osa1r6 CDS 4544064 4544423 . + 0 Parent=13102.m00974
There are 4,011 reads that map to this gene (between positions 4542759 and 4544980).
There are 4 exons for this particular gene which contain a total of 1,043 base pairs.
So the RPKM for this particular gene is ((4,011 reads/1.043kb of exon)/31.8mill mapped reads) = 120.9RPKM
If I just calculate the RPKM for different isoforms, I just want to use the reads in exons, so it is about 3000 reads.
So my RPKM is ((3000/1043)/31.8mill mapped reads)=90RPKM.
So if I just want to compare the expression level of this isofoms across samples, is it ok to do that?
Thanks in advance.
Cufflinks assemble the isoforms first, right? I just want to use the known isoforms in refseq.
No, you can run it with a GTF file, then it doesn't assemble.
As I think, the isoforms expression level should not include the intron reads, since we only need mRNA expression.