Question

What is RPKM/FPKM > 1 or 3 or 5?

5

Entering edit mode

6.6 years ago

Susmita Mandal ▴ 110

Hello all,

I have a very basic question. In many papers and analysis we see analysis are been doing using genes having a threshold like RPKM/FPKM >1 or 3 or 5. What is this threshold? What does it mean and how do you calculate it? I'm having trouble understanding this and finding papers/articles to explain this. Any help is appreciated.

Thanks, Susmita

RNA-Seq rpkm fpkm normalization ngs • 15k views

ADD COMMENT • link updated 3.3 years ago by ccfpwll ▴ 10 • written 6.6 years ago by Susmita Mandal ▴ 110

0

Entering edit mode

See for example this blog post: https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/

ADD REPLY • link 6.6 years ago by WouterDeCoster 48k

0

Entering edit mode

For a nice explanation, also see StatQuest

ADD REPLY • link 6.6 years ago by ATpoint 89k

0

Entering edit mode

I think this a really good question that wet lab biologists care about: what threshold of RNA count (any normalized form) could lead to detectable protein expression (by western blot or flow cytometry).

ADD REPLY • link 3.3 years ago by ccfpwll ▴ 10

score 2 · Answer 1 · 2019-01-15

2

Entering edit mode

6.6 years ago

Devon Ryan 105k

The threshold itself is pretty arbitrary and should be based off of your own data. In general, what people are trying to do with this is to look at only "expressed" genes, for some hopefully reasonable meaning of expressed.

RPKM/FPKM is computed as follows:

"number of reads" / "length of gene or region in kb" / (total reads in millions)

For paired-end data, substitute "number of fragments" for reads. You can also get these values from a number of programs, such as stringTie and RSEM (I think RSEM produces them too, but don't quote me on that).

ADD COMMENT • link 6.6 years ago by Devon Ryan 105k

0

Entering edit mode

And how do you decide which ones are the "expressed" genes?

ADD REPLY • link 6.6 years ago by Susmita Mandal ▴ 110

0

Entering edit mode

Those which have their RPKM/FPKM above a certain threshold are considered "expressed".

ADD REPLY • link 6.6 years ago by WouterDeCoster 48k

0

Entering edit mode

Using an arbitrary cutoff on these expression values - as you say typically 1, 3 or 5.

ADD REPLY • link 6.6 years ago by Kristoffer Vitting-Seerup ★ 4.2k

0

Entering edit mode

Does this cutoff means that all the genes in a particular sample are having at least this cut-off RPKM?

ADD REPLY • link 6.6 years ago by Susmita Mandal ▴ 110

0

Entering edit mode

Yes. You filter the obtained RPKM counts to only keep genes with expression above that cut-off.

ADD REPLY • link 6.6 years ago by WouterDeCoster 48k

1

Entering edit mode

Important to remember, though, that, due to the way that these units are derived, the values are not cross comparable across samples.

To derive RPKM/FPKM expression units, samples are only normalised 'within themselves' - there is no cross-sample normalisation. Thus, due to external factors for which this normalisation method does not control, a value of 10 in one sample is not the same as 10 in another. For this reason, in addition, these units are not suitable for differential expression analysis and you should abandon their usage if your aim is to conduct differential expression.

ADD REPLY • link 6.6 years ago by Kevin Blighe 89k

0

Entering edit mode

What would you suggest instead?

ADD REPLY • link 6.6 years ago by Susmita Mandal ▴ 110

0

Entering edit mode

Obtain the raw counts, if you can, and then use EdgeR or DEseq2 for performing normalisation and differential expression comparisons.

ADD REPLY • link 6.6 years ago by Kevin Blighe 89k

score 2 · Answer 2 · 2019-01-15

As mentioned, the purpose is to set a cutoff for what is considered 'expressed'. This is also where the concept of TPM (transcripts per million) started becoming popular rather then RPKM/FPKM since the attempt is to quantify the expression in a complete transcript. For what is considered a good cutoff is debatable by analysis groups. The Sequence Quality Consortium (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4810084/) and (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321899/) is an FDA-led group that was put together since pharmaceutical companies were submitting RNA-Seq results rather then microarray data as proof of expression data. This group did a fairly good assessment on the consistencies and relative cutoffs for RNA-Seq data. They reported that as low as 1 FPKM was verifiable by RT-PCR. It is also well known that variability in RNA-Seq data greatly increases the lower expression.