Question

Does the order of the batch variable matter in Limma design matrix?

0

Entering edit mode

9 months ago

CTLong ▴ 120

Hi all,

I come to find that the order of variables in generating a design matrix with batch effect is different in DESeq2 and Limma. Just wondering if I have misinterpreted anything, or is this a discrepancy between the two packages.

Here's how I would model batch effect with DESeq2

dds <- DESeqDataSetFromMatrix(countData = df,
                          colData = metadata, 
                          design = ~ batch + condition)

For Limma I would do the following

design <- model.matrix(~0 + condition + batch)

I'm slightly less confident about the matrix design that I specified for Limma. Just want to ask if this is the correct order in how the variables should be assigned, and whether I should include the 0 preceeded all my variables to be able to specify contrasts, if I am performing pairwise comparisons.

Batch • 1.2k views

ADD COMMENT • link updated 9 weeks ago by Gordon Smyth ★ 7.7k • written 9 months ago by CTLong ▴ 120

0

Entering edit mode

Hi,

On a similar note - i am running limma through DEP2 and found that the order of factors/covariates in the design matrix does affect the final analysis.

This is the code i have used:

design_formula <- formula(~0 + condition + PMI + Sex + Age + Duration)
contrasts2 <- DEP2::test_diff(data_norm2, type = "manual", test = c("M_vs_C"), fdr.type = "BH", design_formula=formula(design_formula))

I have tested a bunch of iterations of the design formula. Using the same combination of factors/covariates but in a different order affected the number of significantly differentially expressed proteins.

In the image i have a summary table showing the number of significant proteins (proteinnum) and the smallest adjusted p val in the dataset (BH) for the different combinations. If you look at the second last and third last row in the table - the same covariates in a different order produce a different result.

Why might this be?

Thanks heaps

enter image description here

ADD REPLY • link updated 9 weeks ago by GenoMax 147k • written 9 weeks ago by alanna.spiteri • 0

0

Entering edit mode

I don't know anything about DEP2 but I can tell you for sure that limma gives the same results regardless of the order of terms in the model (apart from the intercept removal ~0 affecting the first term).

If you are getting different results from the 2nd and 3rd last models, then that indicates a bug either in your code or in DEP2.

ADD REPLY • link 9 weeks ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

Hey Gordan,

Thanks heaps for the prompt reply.

ADD REPLY • link 9 weeks ago by alanna.spiteri • 0

score 1 · Answer 1 · 2024-02-29

1

Entering edit mode

9 months ago

Gordon Smyth ★ 7.7k

limma works with the variables in either order and with or without the 0+. So there is absolutely no reason why you can't use ~ batch + condition. But we often suggest that you use ~0 + condition + batch so that you can construct contrasts between the conditions in an intuitive way. If you remove the intercept by using 0+ then which factor is listed first does matter because you can only remove the intercept once, so it affects only the first factor in the model.

Which way you setup the design matrix is just convenience. It is entirely up to you.

For more discussion, see A guide to creating design matrices for gene expression experiments.

ADD COMMENT • link 9 weeks ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

Hi Gordon, thanks for the advice. I did notice that setting the design ~0 + batch + condition allows me to include all the contrasts I want for my conditions. Whereas ~0 + condition + batch would cause one of my conditions to be excluded from the design matrix to I can't make a contrast with that condition. Good to know that the order does not impact the actual DE anlaysis.

ADD REPLY • link 8 months ago by CTLong ▴ 120

0

Entering edit mode

I think you mean the other way around.

But excluding one of the conditions from the design matrix does not prevent you from making contrasts with that condition. limma can easily test all the pairwise contrasts regardless of which design matrix you use. If the first condition is excluded from the design matrix, you just need a slightly deeper understanding of what the coefficients mean in order to form the pairwise comparisons. As I keep saying, it is entirely a matter of convenience.

ADD REPLY • link 8 months ago by Gordon Smyth ★ 7.7k