Question

Why don't I see people calculating FPKM from normalized read counts?

1

Entering edit mode

9 days ago

shelkmike ★ 1.6k

There are efficient methods for normalizing read counts, such as the median of ratios method used by default in DESeq2. It would seem more correct to calculate FPKM (or TPM, or RPKM) from normalized read counts rather than from raw read counts. FPKM calculated from normalized read counts, I think, would be an excellent metric of gene expression, because it would take into account both gene length and include a correction for the influence of highly expressed genes. However, people either calculate FPKM from raw read counts, or perform the median of ratios correction without converting it to FPKM. Why is that so? Am I missing something?

normalization RNA-seq • 672 views

ADD COMMENT • link updated 4 days ago by kalavattam ▴ 350 • written 9 days ago by shelkmike ★ 1.6k

0

Entering edit mode

Scientists stopped using FPKM in general because better measures are available:

FPKM not suitable for DE?

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

ADD REPLY • link 9 days ago by Istvan Albert 102k

2

Entering edit mode

For anyone interested, this is the first paper to point out that RPKM (and, by extension, FPKM) normalization is inconsistent across samples due to variations in total RNA output and transcript composition (PMID 22872506). ~~The authors introduced TPM as an alternative.~~ That was in 2012—not long after the introductions of RPKM in 2008 (PMID 18516045) and FPKM in 2010 (PMID 20436464).

ADD REPLY • link 4 days ago by kalavattam ▴ 350

2

Entering edit mode

Bo Li and Colin Dewey (the developers of RSEM) introduced TPM in 2009 even before the cufflinks paper introduced FPKM. https://academic.oup.com/bioinformatics/article/26/4/493/243395

However, I think that 2012 paper you linked to formalized the relationship between RPKM and TPM.

ADD REPLY • link 5 days ago by dsull ★ 7.4k

0

Entering edit mode

Thank you for correcting me. I've updated the post.

ADD REPLY • link 4 days ago by kalavattam ▴ 350

score 4 · Answer 1 · 2025-04-16

Probably a general lack of awareness. People use what is prominent, and there is a lot of posts and also misconception on all these metrics. Few people actually understand why simple per-million scaling is often poor.

I tried addressing this in some posts, e.g.:

Of note, both FPM and FPKM-like metrics are available in combination with proper normalization methods that account for the composition bias, for example:

edgeR::rpkm() given that calcNormFactors() was run before on the input DGEList
DESeq2::fpkm() given that size factors were estimated before

Worth watching: