Question

How is Log2FC calculated in a paired analysis

0

Entering edit mode

4.8 years ago

tbkuipers ▴ 30

Hi all,

I have done a paired analysis (VTE1 vs VTE0 + pair) on RNA-Seq data and an unpaired analysis (VTE1 vs VTE0) using EdgeR. I noticed that there is a difference in Log2FC values in the 2 analysis. This is probably caused due to difference in the design (I guess). My samples look like this:

Samples VTE pair
sample1 0   pair_1
sample2 1   pair_1
sample3 0   pair_2
sample4 1   pair_2
sample5 0   pair_3
sample6 1   pair_3

In case of the unpaired analysis fold changes are calculated by log2(mean(VTE1)/mean(VTE0)). However, I have no idea how the FC values are calculated when adding the pairs to it.

Hope my question is clear and someone can explain this to me!

Best regards, Tom

RNA-Seq Fold Change FC log2fc analysis • 3.1k views

ADD COMMENT • link updated 4.8 years ago by i.sudbery 22k • written 4.8 years ago by tbkuipers ▴ 30

0

Entering edit mode

https://support.bioconductor.org/p/134493/

ADD REPLY • link 4.8 years ago by Gordon Smyth ★ 8.3k

0

Entering edit mode

Hi @tbkuipers, its usually considered bad manners to simultaneously cross post the same question to two different sites, as it can lead to a duplication of effort from a limited total pool of effort. In future, its politer to post to just one site. If after an appropriate time (say a couple of days) you don't have an answer on that site, then it would be appropriate to post to another, at that point.

ADD REPLY • link 4.8 years ago by i.sudbery 22k

0

Entering edit mode

Hi, I understand I have removed the other post

ADD REPLY • link 4.8 years ago by tbkuipers ▴ 30

score 0 · Answer 1 · 2020-10-07

The simplest way to calculate log fold change in a paired experiment would be to calculate the logFC for each pair and then take the mean of that.

However, and Gordon will correct me if i'm wrong, this is not what is reported in edgeR. edgeR finds the best values for parameters in the model

expression = a + b1* pair_2 + b2pair_3 * b3VTE1

where a is the expression in sample1 (VTE0, pair_1) and pair_2, pair_3 and VTE1 are indicator variables saying whether each sample is in pair_2, pair_2 or VTE1. So for sample, their values would be 1, 0 and 1 for sample2, and 0, 1 and 0 for sample3.

What this means is that, effectively, for each VTE1 sample, you take the best line expression and then add on the difference in average expressin between its pair and pair1, and then add on the difference in expression between the average VTE0 and VTE1 sample after the pair correction as been made.

The value of b3 is then what is reported as the logFC.

(its actually a bit more complicated than this because in edgeR, the model isn't a linear model, but GLM, however, the same reasoning applies).