Why don't I see people calculating FPKM from normalized read counts?
1
1
Entering edit mode
9 days ago
shelkmike ★ 1.6k

There are efficient methods for normalizing read counts, such as the median of ratios method used by default in DESeq2. It would seem more correct to calculate FPKM (or TPM, or RPKM) from normalized read counts rather than from raw read counts. FPKM calculated from normalized read counts, I think, would be an excellent metric of gene expression, because it would take into account both gene length and include a correction for the influence of highly expressed genes. However, people either calculate FPKM from raw read counts, or perform the median of ratios correction without converting it to FPKM. Why is that so? Am I missing something?

normalization RNA-seq • 672 views
ADD COMMENT
0
Entering edit mode

Scientists stopped using FPKM in general because better measures are available:

FPKM not suitable for DE?

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

ADD REPLY
2
Entering edit mode

For anyone interested, this is the first paper to point out that RPKM (and, by extension, FPKM) normalization is inconsistent across samples due to variations in total RNA output and transcript composition (PMID 22872506). The authors introduced TPM as an alternative. That was in 2012—not long after the introductions of RPKM in 2008 (PMID 18516045) and FPKM in 2010 (PMID 20436464).

ADD REPLY
2
Entering edit mode

Bo Li and Colin Dewey (the developers of RSEM) introduced TPM in 2009 even before the cufflinks paper introduced FPKM. https://academic.oup.com/bioinformatics/article/26/4/493/243395

However, I think that 2012 paper you linked to formalized the relationship between RPKM and TPM.

ADD REPLY
0
Entering edit mode

Thank you for correcting me. I've updated the post.

ADD REPLY
4
Entering edit mode
9 days ago
ATpoint 87k

Probably a general lack of awareness. People use what is prominent, and there is a lot of posts and also misconception on all these metrics. Few people actually understand why simple per-million scaling is often poor.

I tried addressing this in some posts, e.g.:

Of note, both FPM and FPKM-like metrics are available in combination with proper normalization methods that account for the composition bias, for example:

  • edgeR::rpkm() given that calcNormFactors() was run before on the input DGEList
  • DESeq2::fpkm() given that size factors were estimated before

Worth watching:

ADD COMMENT

Login before adding your answer.

Traffic: 3354 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6