Question

Differences between Counts and FPKMS

0

Entering edit mode

7.8 years ago

Ron ★ 1.2k

Hi all,

I have come across a gene, which show a 7 fold increase in Treatment (3 replicates ) vs Control (3 replicates ),while looking at FPKMS.(Took Mean of treament/Mean of Control to get a foldchange)

But doing differential expression(using counts from htseq-count),I am getting a log2foldchange of -0.5 which means the Treatment has lower expression as compared to Controls.

There are couple more such instances that show discordancy between differential expression and FPKMS.

Any suggestions?

Thanks,

Ron

RNA-Seq next-gen DEseq differential-expression • 2.8k views

ADD COMMENT • link updated 7.8 years ago by Rob 6.9k • written 7.8 years ago by Ron ★ 1.2k

2

Entering edit mode

7.8 years ago

mforde84 ★ 1.4k

Counts are not normalized by either the sequencing depth or gene feature size. FPKM are.

http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/

ADD COMMENT • link 7.8 years ago by mforde84 ★ 1.4k

4

Entering edit mode

I'll add that FPKMs are sometimes sequencing-depth normalized in incredibly problematic and unrobust ways. As a general rule, DESeq2/edgeR/limma are to be trusted over anything FPKMs say.

ADD REPLY • link 7.8 years ago by Devon Ryan 104k

0

Entering edit mode

So,what should be the result I go with in this case? fold change from FPKMS or fold change from DEseq ?

ADD REPLY • link 7.8 years ago by Ron ★ 1.2k

1

Entering edit mode

Don't use fold changes from FPKM for any type of quantitative analysis. They will not be properly normalized for comparisons between samples or replicates. You should instead look at fold changes that normalize for these differences. In a "standard" analysis, DESeq / DESeq2 / EdgeR / limma will perform such a normalization.

ADD REPLY • link 7.8 years ago by Rob 6.9k

0

Entering edit mode

agreed.

at the moment best practices are:

deseq - library size correction and vst limma - voom cpm calculation with some sort of normalization method (e.g., quantile) edger - not entirely sure, not too familiar with it.

i tend to use tpm alot as well, as it doesn't have the same issues as FPKM and RPKM.

ADD REPLY • link 7.8 years ago by mforde84 ★ 1.4k

0

Entering edit mode

agreed with the agreement ;P. I'd only offer the caveat that, while TPM is universally preferable to F/RPKM, it is still a purely relative abundance metric. For this reason alone, it's not really appropriate to use as is for quantitative comparison between samples / condition .

ADD REPLY • link 7.8 years ago by Rob 6.9k

0

Entering edit mode

eh, technically rnaseq is only relative quantification without ERCC spike-ins and even then it's somewhat iffy. FDA SeqC has some good work on this issue. also both tpm and cpm can be used for between sample comparisons because they are normalized per million reads, so that should be fine. but i think both are still biased towards larger transcripts. in the past i've found it's a good idea to look at quantile 0.4 as a good cut off for DEG analysis, or use normalized counts from similar sized intergenic regions as an indication of sequencing noise.

ADD REPLY • link 7.8 years ago by mforde84 ★ 1.4k

2

Entering edit mode

True, but spike-ins can exhibit crazy variability / imprecision. Theoretically they are great, but their practical utility is mediated to a large degree by the skills of the person preparing the samples.

ADD REPLY • link 7.8 years ago by Rob 6.9k

1

Entering edit mode

yep, completely agree. massively parallel sequencing is unfortunately very noisy especially for smaller and low abundance features. also i think this problem scales linearly with sequencing depth. in my opinion it's still a very "experimental" technology.

ADD REPLY • link 7.8 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Okay,another thing I wanted to know is that a norm to see such differences(have you guys come across such differences too in the past)?

ADD REPLY • link 7.8 years ago by Ron ★ 1.2k

1

Entering edit mode

I've never seen differential expression ... ever ... :D. Just kidding. It depends. If your gene annotation has say 15000 genes, and 14000 are differentially expressed then yea, there's probably a problem. What you want are sanity checks. Do you see something that you should expect to see ... a positive control.

ADD REPLY • link 7.8 years ago by mforde84 ★ 1.4k

score 5 · Accepted Answer · 2017-02-15

Hi Ron,

These are completely different measures. As mforde84 points out, raw read counts don't normalize for (1) the fact that a transcript / gene will produce counts that depend on the gene's length and (2) the fact that, in different samples, a particular read count may have different meanings (e.g., imagine a 20M read experiment vs a 40M read experiment). I recently wrote a blog post covering these metrics, what they mean, and some of the differences between them. Harold Pimentel has a great blog post on this topic as well.