Question

What is the reason why we usually use normalized values from RNA-Seq (FPKM, RPKM, etc.) ?

1

Entering edit mode

8.0 years ago

ebrudermanver ▴ 100

I don't have much experience with RNA-Seq but I am seeing that the data is usually published not in raw counts but in FPKM values. What is the reason for that? Is it only because so that we can model the values by a log-Gaussian distribution rather than a discrete distribution like Poisson or negative binomial? Or does it have any purpose to make data more accurate and reliable?

RNA-Seq • 3.9k views

ADD COMMENT • link updated 8.0 years ago by Michael 56k • written 8.0 years ago by ebrudermanver ▴ 100

score 3 · Answer 1 · 2017-09-03

The reason for FPKM is mostly historical as there are practically only disadvantages in distributing the data this way.

There are several posts and publications showing that FPKM is inferior to other units.
FPKM is not directly compatible with most DE packages.
Providing raw counts would instead allow anyone to compute the transformation they wanted (CPM, TPM, FPKM), while the FPKM transformation is not easily reversible.
FPKM manifests biases and errors in the gene prediction, especially it is not suitable for draft genomes where the exons are often not well annotated.
FPKM need to be represented as floating point values, introducing unnecessary rounding errors and maybe data volume, while the counts can be represented by integers.

score 2 · Answer 2 · 2017-09-03

R(F)PKM/TPM values are used to normalize read counts by library size (total number of reads you have in a given RNAseq experiment) and the length of the feature (gene/transcript). But remember that commonly used software for differential expression analysis (DESEQ2/EdgeR) are using raw counts instead of normalized values (they do their internal normalization steps).

score 0 · Answer 3 · 2017-09-03

0

Entering edit mode

8.0 years ago

ebrudermanver ▴ 100

Okay, I just found that link which says that FPKM makes it possible to compare Gene A to Gene B even if they are of different lengths, and to compare Sample 1 and Sample 2 even if they have different library sizes.

ADD COMMENT • link 8.0 years ago by ebrudermanver ▴ 100