Question

How To Calculate Fpkm (Fragments Per Kilobase Of Exon Per Million Fragments Mapped)

21

Entering edit mode

14.1 years ago

User 5402 ▴ 210

Hello,

Perhaps this has already been answered somewhere, but I am not seeing a satisfactory explanation. I want to understand how one calculates FPKM (fragments per kilobase of exon per million fragments mapped) in RNA-seq data. Everywhere I look, I see people saying that it is the number of reads aligned per kilobases of the transcript per million mappable read from the total dataset, and that the difference between RPKM and FPKM is that one fragment is a pair of reads for paired end data. If I have any aspect of that wrong, please inform me.

If the above is right, then how is it that Cufflinks is able to find transcripts that are as low as 10^-12 FPKM? How is that possible?

So I have tried to do a back of the envelope calculation on a gene that has a very low FPKM as reported by Cufflinks. This gene's total combined exons are ~3 kb. It has ~2000 reads aligned by Tophat and the dataset has ~24 million reads in total. If I am understanding how to calculate it, it seems like the gene's FPKM should be 28 or at least somewhere near that order of magnitude. Instead the Cufflinks output says that is has a FPKM of 2.9531e-12. What am I missing here/doing wrong? How can any transcript have such a low FPKM/RPKM? If the dataset size is in the range of 10-100 million reads, then to get a number like 10^-12, with even just 1 read/fragment you would need a transcript that is larger than the size of the human genome?

So I know I must not be understanding this right. Thank you in advance for your help!

rna fpkm rpkm • 107k views

ADD COMMENT • link written 14.1 years ago by User 5402 ▴ 210

1

Entering edit mode

you are totally right, there is no way of getting near that number using the RPKM formula ( 2000/(3000*2.4e7) ~ 28). Is the FPKM formula maybe different? Is is documented how cufflinks calculates this?

ADD REPLY • link 14.1 years ago by Michael 55k

1

Entering edit mode

Paired-end based "fragments per kilobase of exon per million fragments mapped" (FPKM) is analagous to single-end based "reads aligned per kilobases mapped" (RPKM) and is "simply a nomenclature change to better reflect what RNA-Seq actually measures".

http://cufflinks.cbcb.umd.edu/howitworks.html http://cufflinks.cbcb.umd.edu/

ADD REPLY • link 14.1 years ago by Casey Bergman 18k

0

Entering edit mode

Cuffflinks uses a statistical model to calculate FPKM.. It's given in the supplementary methods of the cufflinks paper. Even while running cufflinks you have to input the mean and variance of the read length distribution (for single reads). The results vary with different parameters.

ADD REPLY • link 11.8 years ago by Bharat Iyengar ▴ 330

score 3 · Answer 1 · 2011-03-23

3

Entering edit mode

14.1 years ago

Mikael Huss 4.8k

Hmm. Are you looking at the gene level or the transcript level? If you are looking at transcript FPKMs and the gene in question has alternative transcripts, one of the isoforms could get a zero estimate while another (or several others) would get the reads assigned to it/them.

I have never seen Cufflinks FPKMs as low as 1e-12. (Except for 0, of course!) The smallest values I get tend to be around 0.0001 (1e-4). Could it be a numerical issue, where the estimate is really zero, but the program reports a very small value instead? (Although in those cases, I think the values tend to be even smaller, like ~1e-16 depending on the machine)

ADD COMMENT • link 14.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

This specific gene only has one transcript according to Cufflinks.

Do you think there could there be a problem with the way I am running Cufflinks? I have just been using the defaults and running the Tophat output against the human .gtf file. I am seeing ~30 transcripts that do not fail in the Cufflinks status but have FPKM values from 1e-5 through 1e-12.

Most importantly though, how is Cufflinks calculating 2.9531e-12 (or 0 if that is a numerical issue) FPKM internally, since that still makes no sense.

ADD REPLY • link 14.1 years ago by User 5402 ▴ 210

0

Entering edit mode

I'm stumped - I can't think of any way to run the program so that you would end up with values like that. I think you'll have to email and ask the developers of Cufflinks.

ADD REPLY • link 14.1 years ago by Mikael Huss 4.8k

1

Entering edit mode

Did u find the reason for Cufflinks low FPKM values ?

ADD REPLY • link 11.2 years ago by GouthamAtla 12k

1

Entering edit mode

Probably just a numerical issue. I would consider it zero.

ADD REPLY • link 11.2 years ago by Mikael Huss 4.8k