Hello All,
I'm using the Limma package to perform some microarray analysis. In the past I've typically compared 2 groups in contrasts such as "GroupA-GroupB". Occasionally, I've even done comparisons where I've normalized before comparing 2 groups such as "(GroupAExp-GroupAControl)-(GroupBExp-GroupBControl)". In my current analysis I have 4 levels for a particular factor and I'd like to compare 2 of the levels to 2 of the other levels. So, my targets file is as follows.
Sample Condition
S1 A
S2 A
S3 A
S4 B
S5 B
S6 B
S7 C
S8 C
S9 C
S10 D
S11 D
S12 D
And basically my contrast is "(A+B)-(C+D)". Which is to say I'd like to find genes that are significantly higher(or lower) in the pooled A and B group compared to the C and D group. Unfortunately, I'm not quite sure I understand the log fold changes that are output. I don't quite understand how Limma is calculating these. I could always just create another column/factor in which I put A & B together as one factor and C & D as another factor but in truth these 4 levels are distinct. So my questions to the group are
- Is it better to make the contrast as I have or would it be more appropriate to create a new factor and pool A+b into 1 level and C+D into another level.
- If the contrast I've chosen is correct how does the logFC get calculated, I know that its simply not taking the all the values in A+B and subtracting all the values in C+D as that doesn't yield the same result as what I see in the output.
Any thoughts would be much appreciated.
Cheers,
Pichai
Thanks Brent, yeah I actually think I have no choice but to create a new column. The contrast is essentially taking the mean of the first level adding it to the mean of the second level and then subtracting the mean of the 3rd and 4th levels which is inflating the LogFC of course. I assumed somewhat naively that it would just mean all values in A and B together and subtract the mean of all values in C and D.
re 1) for simplicity, you could make a new column as you suggest:
and 2) you can find lots of information on how limma calculates logFC, e.g. :https://stat.ethz.ch/pipermail/bioconductor/2012-January/042950.html
If you want A and B to be pooled then you won't be doing a contrast, pretty much by definition.
All I wanted to do was contrast samples that are members of groups A or B with samples that are members of groups C and D. This equates to creating a new factor with 2 levels e.g. E and F , E) all samples in A and B, F) all samples in C and D. so our contrast may be "E-F"
What I did was "(A+B)-(C+D)" which I believe is still a contrast. You're contrasting the sum of A and B with that of C and D which I'm sure has some use case somewhere (just not for me right now). Is there another term for this expression, if its not a contrast?
I think your confusion arises from thinking that the contrast (here
(A+B)-(C+D)
) applies to the raw values rather than to the fit model. By the time you perform the contrast, you've already fit the 4 factor model and are using the coefficients and their deviations. If you pool things, you're not doing a contrast and you're also decreasing your power since you're not compensating for a known effect (thus, the variance is increased). Perhaps this ends up making sense for the underlying biological question, perhaps not.