Question

Which Expression Units To Use, Fpkm Or Rpkm ?

22

Entering edit mode

12.4 years ago

biorepine ★ 1.5k

Dear Biostars! I think this is one of the common problems (which expression units to use, FPKM or RPKM) in RNA-Seq expression analysis. People who use cufflinks end up with FPKM and ERANGE with RPKM. Cufflinks has nice explanation why FPKM save us from the skewed expression values called by other softwares especially with paired-end read data....

They're almost the same thing. RPKM stands for Reads Per Kilobase of transcript per Million mapped reads. FPKM stands for Fragments Per Kilobase of transcript per Million mapped reads. In RNA-Seq, the relative expression of a transcript is proportional to the number of cDNA fragments that originate from it. Paired-end RNA-Seq experiments produce two reads per fragment, but that doesn't necessarily mean that both reads will be mappable. For example, the second read is of poor quality. If we were to count reads rather than fragments, we might double-count some fragments but not others, leading to a skewed expression value. Thus, FPKM is calculated by counting fragments, not reads.

However, after analyzing around 10 tissues paired end, long, polyA+, RNA-Seq datasets (after mapping them with TopHat and Bowtie), I noticed that same genes that have expression of FPKM between >0 and <1 have ~200 RPKM. I think this difference could cause serious problems in defining accurate expression units and defining the number of expressed or up-regulated or down-regulated..

I would appreciate if any answer or comment on using RPKM over FPKM or vice versa ? Gracias! :)

fpkm rpkm rna-seq • 118k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 12.4 years ago by biorepine ★ 1.5k

0

Entering edit mode

Just to make sure - if I have paired and reads, then one read can be mapped an other not and in this case I will count it as one fragment? And if both reads are mapped, I will also count it as one fragment? (Otherwise I do not understand how we could double-count some fragments when counting raw reads). Thank you very much for explanation.

ADD REPLY • link 12.4 years ago by Biomonika (Noolean) 3.2k

2

Entering edit mode

Use neither.

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k

0

Entering edit mode

So what should be used?

ADD REPLY • link 5.8 years ago by randalljellis ▴ 100

1

Entering edit mode

You could normalise your raw counts using edgeR or DESeq2. If you need to export data for downstream analyses, my preference is always the regularised log or variance-stabilised expression values from DESeq2.

ADD REPLY • link 5.8 years ago by Kevin Blighe 89k

0

Entering edit mode

Please, read this article,

http://bioinfogeek.over-blog.com/2017/09/gene-expression-units-explained-rpm-rpkm-fpkm-and-tpm.html

ADD REPLY • link 7.9 years ago by Renesh ★ 2.2k

score 10 · Answer 1 · 2013-04-03

I think FPKM is the conceptually cleaner way to go, and thus is the preferred term. The rationale is that one is inferring expression level of a gene (concentration of a transcript) based on observations of a fragment from that transcript. Whether the presence of that fragment is quantified from 1 read, or 2 reads, is simply a technical concern, outside of the unit definition. Granted, you indicated a result where software reports different values on a data set for the two different units, but I would argue that's because of messy implementation. A read is evidence of a fragment, 2 paired-end reads are evidence of a fragment. Evidence of a fragment is used to count transcripts. Since both infer fragment counts, I think FPKM is the more general and appropriate term. (that's my opinion - though I'm not sure it helps your particular quandry).

Ram · Answer 2 · 2013-04-03

I think there's some confusion in the question and comments here. FPKM are the "fancy" units that cufflinks uses specifically to report its probabilistic estimates of isoform abundances. They don't have direct mappings from individual reads, though of course they are estimated from the read data. The f instead of r is to unify the terminology to data from paired (and higher order) reads.

For more on this topic see Meaning Of Fpkm Value Used By Cufflinks and here.

So to me, "should I use FPKM" is more accurately "should I use Cufflinks." RPKM would typically be used by a more "direct" analysis that maps reads to specific single exons and yields an exon-level analysis, rather than a more complicated isoform-level analysis with advanced statistical techniques.

With that said, differences between FPKM and RPKM are most likely due to the complicated procedure the cufflinks follows to estimate isoform abundance, rather than any paired vs. single counting issue.

Furthermore, I don't think the FPKM vs. RPKM question has any direct bearing to the ENCODE results, as suggested in a comment above.

score 0 · Answer 3 · 2019-11-24

0

Entering edit mode

5.7 years ago

Renesh ★ 2.2k

https://reneshbedre.github.io/blog/expression_units.html

ADD COMMENT • link 5.7 years ago by Renesh ★ 2.2k